user2662565
user2662565

Reputation: 529

Regression summary in R returns a bunch of NAs

Trying to run an uncomplicated regression in R and receiving long list of coefficient values with NAs for standard error and t-value. I've never experienced this before.

Result:

summary(model)

Call:
lm(formula = fed$SPX.Index ~ fed$Fed.Treasuries...MM., data = fed)

Residuals:
ALL 311 residuals are 0: no residual degrees of freedom!

Coefficients:
                                Estimate Std. Error t value Pr(>|t|)
(Intercept)                          1258.84         NA      NA       NA
fed$Fed.Treasuries...MM. 1,016,102      0.94         NA      NA       NA
fed$Fed.Treasuries...MM. 1,030,985     17.72         NA      NA       NA
fed$Fed.Treasuries...MM. 1,062,061     27.12         NA      NA       NA
fed$Fed.Treasuries...MM. 917,451      -52.77         NA      NA       NA
fed$Fed.Treasuries...MM. 949,612      -30.56         NA      NA       NA
fed$Fed.Treasuries...MM. 967,553      -23.61         NA      NA       NA

Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:    NaN 
F-statistic:   NaN on 310 and 0 DF,  p-value: NA



head(fed)
X Fed.Treasuries...MM. Reserve.Repurchases Agency.Debt.Held Treasuries.Maturing.in.5.10.years SPX.Index
1  10/1/2008             476,621              93,063           14,500                            93,362    1161.06
2  10/8/2008             476,579              77,349           14,105                            93,353     984.94
3 10/15/2008             476,555             107,819           14,105                            94,336     907.84
4 10/22/2008             476,512              95,987           14,105                            94,327     896.78
5 10/29/2008             476,469              94,655           13,620                            94,317     930.09
6  11/5/2008             476,456              96,663           13,235                            94,312     952.77

Upvotes: 0

Views: 4172

Answers (1)

Spacedman
Spacedman

Reputation: 94182

You have commas in your numbers in your CSV file, R reads them as characters. Your model then has as many levels as rows, and so is degenerate.

Illustration. Take this CSV file:

1, "1,234", "2,345,565"
2, "2,345", "3,234,543"
3, "3,234", "3,987,766"

Read in, fit first column (numbers) against third column (comma-separated numbers):

> fed = read.csv("commas.csv",head=FALSE)
> summary(lm(V1~V3, fed))

Call:
lm(formula = V1 ~ V3, data = fed)

Residuals:
ALL 3 residuals are 0: no residual degrees of freedom!

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)         1         NA      NA       NA
V3 3,234,543        1         NA      NA       NA
V3 3,987,766        2         NA      NA       NA

Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:    NaN 
F-statistic:   NaN on 2 and 0 DF,  p-value: NA

Note this is exactly what you are getting but with different column names. So this almost certainly must be what you have.

Fix. Convert column:

> fed$V3 = as.numeric(gsub(",","", fed$V3))
> summary(lm(V1~V3, fed))

Call:
lm(formula = V1 ~ V3, data = fed)

Residuals:
       1        2        3 
 0.02522 -0.05499  0.02977 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept) -1.875e+00  1.890e-01  -9.922   0.0639 .
V3           1.215e-06  5.799e-08  20.952   0.0304 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.06742 on 1 degrees of freedom
Multiple R-squared:  0.9977,    Adjusted R-squared:  0.9955 
F-statistic:   439 on 1 and 1 DF,  p-value: 0.03036

Repeat over columns as necessary.

Upvotes: 4

Related Questions