Slim Shady
Slim Shady

Reputation: 220

estimate of the variance of estimator for the effect of a predictor variable in a multiple linear regression model in R

    bweight   gestwks            hyp sex    

1    2974 38.5200004577637       0 female          
2    3270 NA                     0 male            
3    2620 38.150001525878899     0 female          
4    3751 39.799999237060497     0 male            
5    3200 38.889999389648402     1 male           
6    3673 40.970001220703097     0 female          

bweight=baby weight

gestwks=gestation period in week

hyp=presence of maternal hypertension

sex=sex of baby

I haven this sample , and I have created a multiple linear regression model with the following code:

lm2 = lm(bweight ~ gestwks + hyp + male)

Where male and female is a vector of 1's for the male and 0's for the females.

How do I find the unbiased estimate of the variance of the errors sigma^2? Is the code:

summary(lm2)$sigma^2

going to give me the answer i'm looking for?

Also, how do I find the estimate of the variance of estimator for the effect of hypertension.

So, say I have that the presence of hypertension affects the baby's weight with -200 (i.e 1 unit increase in hypertension causes the average weight to decrease in 200). What would be the estimate of the variance of estimators for the effect of hypertension?

Upvotes: 2

Views: 226

Answers (1)

StupidWolf
StupidWolf

Reputation: 46938

Your example data:

df = structure(list(bweight = c(2974L, 3270L, 2620L, 3751L, 3200L, 
3673L), gestwks = c(38.5200004577637, NA, 38.1500015258789, 39.7999992370605, 
38.8899993896484, 40.9700012207031), hyp = c(0L, 0L, 0L, 0L, 
1L, 0L), sex = structure(c(1L, 2L, 1L, 2L, 2L, 1L), .Label = c("female", 
"male"), class = "factor"), male = c(0, 1, 0, 1, 1, 0)), row.names = c("1", 
"2", "3", "4", "5", "6"), class = "data.frame")

df$male = as.numeric(df$sex=="male")
lm2 = lm(bweight ~ gestwks + hyp + male,data=df)

What you want is the variance-covariance matrix:

vcov(lm2)
            (Intercept)     gestwks         hyp       male
(Intercept)   8615153.6 -219476.110 -199723.227 119995.418
gestwks       -219476.1    5596.976    5093.248  -3283.549
hyp           -199723.2    5093.248   57215.841 -29278.523
male           119995.4   -3283.549  -29278.523  36980.334

The diagonal is the variance of each estimator, and if you take the square root, you get the standard error shown with summary:

sqrt(diag(vcov(lm2)))
(Intercept)     gestwks         hyp        male 
 2935.15819    74.81294   239.19833   192.30271 

summary(lm2)

Call:
lm(formula = bweight ~ gestwks + hyp + male, data = df)

Residuals:
         1          3          4          5          6 
 1.218e+02 -1.058e+02  0.000e+00  7.105e-15 -1.598e+01 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -10304.13    2935.16  -3.511    0.177
gestwks        341.55      74.81   4.565    0.137
hyp           -240.19     239.20  -1.004    0.499
male           461.63     192.30   2.401    0.251

Upvotes: 1

Related Questions