Reputation: 325
I'm running linear regression with all predictors (I have 384 predictors), but only get 373 coefficients from summary. I'm wondering why does R not return all coefficients and how can I get all 384 coefficients?
full_lm <- lm(Y ~ ., data=dat[,2:385]) #384 predictors
coef_lm <- as.matrix(summary(full_lm)$coefficients[,4]) #only gives me 373
Upvotes: 0
Views: 3667
Reputation: 5923
E.g., if some columns in your data are linear combinations of others, then the coefficient will be NA
and if you index the way you do, it'll be omitted automatically.
a <- rnorm(100)
b <- rnorm(100)
c <- rnorm(100)
d <- b + 2*c
e <- lm(a ~ b + c + d)
gives
Call:
lm(formula = a ~ b + c + d)
Coefficients:
(Intercept) b c d
0.088463 -0.008097 -0.077994 NA
But indexing...
> as.matrix(summary(e)$coefficients)[, 4]
(Intercept) b c
0.3651726 0.9435427 0.3562072
Upvotes: 0
Reputation: 1709
First, summary(full_lm)$coefficients[,4]
returns the p-values
not the coefficients. Now, to actually answer your question, I believe that some of your variables drop out of the estimation because they are perfectly collinear with some others. If you run summary(full_lm)
, you will see that the estimation for these variables returns NA
in all fields. So, they are not included in summary(full_lm)$coefficients
. As an example:
x<- rnorm(1000)
x1<- 2*x
x2<- runif(1000)
eps<- rnorm(1000)
y<- 5+3*x + x1 + x2 + eps
full_lm <- lm(y ~ x + x1 + x2)
summary(full_lm)
#Call:
#lm(formula = y ~ x + x1 + x2)
#
#Residuals:
# Min 1Q Median 3Q Max
#-2.90396 -0.67761 -0.02374 0.71906 2.88259
#
#Coefficients: (1 not defined because of singularities)
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 4.96254 0.06379 77.79 <2e-16 ***
#x 5.04771 0.03497 144.33 <2e-16 ***
#x1 NA NA NA NA
#x2 1.05833 0.11259 9.40 <2e-16 ***
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Residual standard error: 1.024 on 997 degrees of freedom
#Multiple R-squared: 0.9546, Adjusted R-squared: 0.9545
#F-statistic: 1.048e+04 on 2 and 997 DF, p-value: < 2.2e-16
coef_lm <- as.matrix(summary(full_lm)$coefficients[,1])
coef_lm
#(Intercept) 4.962538
#x 5.047709
#x2 1.058327
Upvotes: 1