9 Generalized Linear Regression

NOTE THAT GLM here is for Generalized Linear Regression, which is different from General linear model.

Wikipedia says: Not to be confused with Multiple linear regression, General linear model or General linear methods.

Generalized Linear Regression (GLM) is linear regression that variables are categorical, and the response variable is exponential family

variables are categorical, count data, etc
A distibution for the regression
A link function (here, \(g\)) connects the predictions to the mean of the distribution

\[\mathbb{E}[Y] = \mu = g^{-1}(X\beta)\]

Examples: Logistics Regression, Probit Regression, Poisson Regression

The logistic regression, \(X\beta = \log{\frac{\mu}{1-\mu}}\)

GLM uses likelihood function, Deviance: smaller deviance -> better model

GLM introduces the conception of deviance, smaller deviance means better model, which is defined:

\[D = -2\log{\frac{\text{Likelihood of Current Model}}{\text{Likelihood of Saturated Model}}}\]

For large sample, the distribution of D is proved by Wilks’ Theorem.

\[D \sim \chi^2 \quad df=k_{full} - k_{reduced}\]

glm(formular, family = ?)

9.1 Logitics Regression Revisited

use family = binomial for logistics regression.

no.yes <- c("No", "Yes")
smoking <- gl(2,1,8,no.yes)
obesity <- gl(2,2,8,no.yes)
snoring <- gl(2,4,8,no.yes)
n.tot <- c(60,17,8,2,187,85,51,23)
n.hyp <- c(5,2,1,0,35,13,15,8)

data.frame(smoking, obesity, snoring, n.tot, n.hyp)

  smoking obesity snoring n.tot n.hyp
1      No      No      No    60     5
2     Yes      No      No    17     2
3      No     Yes      No     8     1
4     Yes     Yes      No     2     0
5      No      No     Yes   187    35
6     Yes      No     Yes    85    13
7      No     Yes     Yes    51    15
8     Yes     Yes     Yes    23     8

hyp.tbl <- cbind(n.hyp, n.tot-n.hyp)
glm.hyp <- glm(hyp.tbl~smoking+obesity+snoring, family = binomial("logit"))
summary(glm.hyp)


Call:
glm(formula = hyp.tbl ~ smoking + obesity + snoring, family = binomial("logit"))

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -2.37766    0.38018  -6.254    4e-10 ***
smokingYes  -0.06777    0.27812  -0.244   0.8075    
obesityYes   0.69531    0.28509   2.439   0.0147 *  
snoringYes   0.87194    0.39757   2.193   0.0283 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 14.1259  on 7  degrees of freedom
Residual deviance:  1.6184  on 4  degrees of freedom
AIC: 34.537

Number of Fisher Scoring iterations: 4

anova(glm.hyp, test = "Chisq")

Analysis of Deviance Table

Model: binomial, link: logit

Response: hyp.tbl

Terms added sequentially (first to last)

        Df Deviance Resid. Df Resid. Dev Pr(>Chi)   
NULL                        7    14.1259            
smoking  1   0.0022         6    14.1237 0.962724   
obesity  1   6.8274         5     7.2963 0.008977 **
snoring  1   5.6779         4     1.6184 0.017179 * 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

9.2 Poisson Regression

The response of Poisson Regression is count data.

Aggregated counts over time
Individual level data with event indicators

\[ \log{\lambda} = X \beta\]

library(ISwR)
data(eba1977)

str(eba1977)

'data.frame':   24 obs. of  4 variables:
 $ city : Factor w/ 4 levels "Fredericia","Horsens",..: 1 2 3 4 1 2 3 4 1 2 ...
 $ age  : Factor w/ 6 levels "40-54","55-59",..: 1 1 1 1 2 2 2 2 3 3 ...
 $ pop  : int  3059 2879 3142 2520 800 1083 1050 878 710 923 ...
 $ cases: int  11 13 4 5 11 6 8 7 11 15 ...

fit.ps <- glm(cases~city+age, data = eba1977, family = poisson)
summary(fit.ps)


Call:
glm(formula = cases ~ city + age, family = poisson, data = eba1977)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  2.24374    0.20363  11.019   <2e-16 ***
cityHorsens -0.09844    0.18129  -0.543    0.587    
cityKolding -0.22706    0.18770  -1.210    0.226    
cityVejle   -0.22706    0.18770  -1.210    0.226    
age55-59    -0.03077    0.24810  -0.124    0.901    
age60-64     0.26469    0.23143   1.144    0.253    
age65-69     0.31015    0.22918   1.353    0.176    
age70-74     0.19237    0.23517   0.818    0.413    
age75+      -0.06252    0.25012  -0.250    0.803    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 27.704  on 23  degrees of freedom
Residual deviance: 20.673  on 15  degrees of freedom
AIC: 135.06

Number of Fisher Scoring iterations: 5