natash
natash

Reputation: 127

Error Linear Regression with missing data

I've got a dataset like this:

dat1 <- read.table(header=TRUE, text="
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9
3.571429 2.8 5 6 2.5 3.3 2.4 2 2.7
3.000000 3.8 4 4.2 2.7 NA NA 2.7 2.9
3.571429 4.2 4.4 5 2.45 2.3 3.14 4 3.4
6.000000 3.4 6 6.3 3.6 2.1 1.29 1.1 3.89
4.714286 2.8 4.8 4.2 2.7 2.78 1 1.1 2.1
4.714286 4.4 5 5.8 2.9 3.1 2.57 1.7 3.56
4.571429 3.8 4.2 4.8 2.36 3.56 2 1.1 3.4
1.857143 1.8 2.4 3.2 3 2.3 2.1 2.3 2.2
2.857143 4.2 3 4.2 3 2.78 2 2 2.2
3.714286 5.6 5.4 6.7 3.2 4.4 4 1.1 4.4
")

lm1 <- lm(Var1 ~ Var2 + Var3 + Var4 + Var5 + Var6 + Var7 + Var8 + Var9, dat1)
summary(lm1)

As you can see, the model cant estimate tests and the R-Squared. Are there too many variables on the explaining site? If yes, why is that the case? Does somebody know how I can compute a linear regression with that dataset.

Thank you!

Upvotes: 1

Views: 132

Answers (1)

jay.sf
jay.sf

Reputation: 72683

Your issue is that n > m of your model matrix is not satisfied, i.e. you have 0 degrees of freedom where you need at least 1.

dim(model.matrix(lm1))
# [1] 9 9

lm1$df.residual
# [1] 0

You need to remove one variable and it works.

lm2 <- lm(Var1 ~ Var2 + Var3 + Var4 + Var5 + Var6 + Var7 + Var8, dat1)
summary(lm2)$coe
#               Estimate Std. Error    t value   Pr(>|t|)
# (Intercept) 16.0641440 0.73423938  21.878620 0.02907757
# Var2         0.5737943 0.02298313  24.965894 0.02548595
# Var3         0.5391540 0.04367360  12.345077 0.05145636
# Var4         0.1765622 0.04862931   3.630778 0.17109762
# Var5        -2.5649702 0.12579573 -20.389963 0.03119722
# Var6        -2.8890936 0.14084936 -20.511940 0.03101199
# Var7         0.8593544 0.09107647   9.435527 0.06721958
# Var8        -1.9838994 0.09997212 -19.844526 0.03205326

dim(model.matrix(lm2))
# [1] 9 8

lm2$df.residual
# [1] 1

Upvotes: 1

Related Questions