Brian Sunbury
Brian Sunbury

Reputation: 45

Summary of model returning NA

I'm new to r and not sure how fix the error I'm getting.
Here is the summary of my data:

> summary(data)
        Metro                          MrktRgn     MedAge     numHmSales   
     Abilene  : 1   Austin-Waco-Hill Country  : 6   20-25: 3   Min.   :  302  
     Amarillo : 1   Far West Texas            : 1   25-30: 6   1st Qu.: 1057  
     Arlington: 1   Gulf Coast - Brazos Bottom:10   30-35:28   Median : 2098  
     Austin   : 1   Northeast Texas           :14   35-40: 6   Mean   : 7278  
     Bay Area : 1   Panhandle and South Plains: 5   45-50: 2   3rd Qu.: 5086  
     Beaumont : 1   South Texas               : 7   50-55: 1   Max.   :83174  
     (Other)  :40   West Texas                : 3                             
        AvgSlPr          totNumLs         MedHHInc          Pop         
     Min.   :123833   Min.   :  1257   Min.   :37300   Min.   :   2899  
     1st Qu.:149117   1st Qu.:  6028   1st Qu.:53100   1st Qu.:  56876  
     Median :171667   Median : 11106   Median :57000   Median : 126482  
     Mean   :188637   Mean   : 24302   Mean   :60478   Mean   : 296529  
     3rd Qu.:215175   3rd Qu.: 25472   3rd Qu.:66200   3rd Qu.: 299321  
     Max.   :303475   Max.   :224230   Max.   :99205   Max.   :2196000  
     NA's   :1 

then I make a model with AvSlPr as the y variable and other the other variables as x variables

> model1 = lm(AvgSlPr ~ Metro + MrktRgn + MedAge + numHmSales + totNumLs + MedHHInc + Pop)

but when I do a summary of the model, I get NA for the Std. Error, t value, and t p-values.

> summary(model1)

Call:
lm(formula = AvgSlPr ~ Metro + MrktRgn + MedAge + numHmSales + 
    totNumLs + MedHHInc + Pop)

Residuals:
ALL 45 residuals are 0: no residual degrees of freedom!

Coefficients: (15 not defined because of singularities)
                                  Estimate Std. Error t value Pr(>|t|)
(Intercept)                         143175         NA      NA       NA
MetroAmarillo                        24925         NA      NA       NA
MetroArlington                       35258         NA      NA       NA
MetroAustin                         160300         NA      NA       NA
MetroBay Area                        68642         NA      NA       NA
MetroBeaumont                         5942         NA      NA       NA
...
MrktRgnWest Texas                       NA         NA      NA       NA
MedAge25-30                             NA         NA      NA       NA
MedAge30-35                             NA         NA      NA       NA
MedAge35-40                             NA         NA      NA       NA
MedAge45-50                             NA         NA      NA       NA
MedAge50-55                             NA         NA      NA       NA
numHmSales                              NA         NA      NA       NA
totNumLs                                NA         NA      NA       NA
MedHHInc                                NA         NA      NA       NA
Pop                                     NA         NA      NA       NA

Residual standard error: NaN on 0 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:      1,     Adjusted R-squared:    NaN 
F-statistic:   NaN on 44 and 0 DF,  p-value: NA

Does anyone know whats going wrong and how I can fix this? Also, I'm not supposed to be using dummy variables.

Upvotes: 0

Views: 3040

Answers (1)

nya
nya

Reputation: 2250

Your Metro variable always refers to a single line for each factor level. You need at least two points to fit a line. Let me demonstrate with an example:

dat = data.frame(AvgSlPr=runif(4), Metro = factor(LETTERS[1:4]), MrktRgn = runif(4))
model1 = lm(AvgSlPr ~ Metro + MrktRgn, data = dat)
summary(model1)

#Call:
#lm(formula = AvgSlPr ~ Metro + MrktRgn, data = dat)

#Residuals:
#ALL 4 residuals are 0: no residual degrees of freedom!

#Coefficients: (1 not defined because of singularities)
#            Estimate Std. Error t value Pr(>|t|)
#(Intercept)  0.33801         NA      NA       NA
#MetroB       0.47350         NA      NA       NA
#MetroC      -0.04118         NA      NA       NA
#MetroD       0.20047         NA      NA       NA
#MrktRgn           NA         NA      NA       NA

#Residual standard error: NaN on 0 degrees of freedom
#Multiple R-squared:      1,    Adjusted R-squared:    NaN 
#F-statistic:   NaN on 3 and 0 DF,  p-value: NA

But if we add more data so that at least some of the factor levels have more than one row of data, the linear model can be calculated:

dat = rbind(dat, data.frame(AvgSlPr=2:4, Metro=factor(LETTERS[2:4]), MrktRgn = 3:5))
model2 = lm(AvgSlPr ~ Metro + MrktRgn, data=dat)
summary(model2)

#Call:
#lm(formula = AvgSlPr ~ Metro + MrktRgn, data = dat)

#Residuals:
#         1          2          3          4          5          6          7 
# 9.021e-17  2.643e-01  7.304e-03 -1.498e-01 -2.643e-01 -7.304e-03  1.498e-01 

#Coefficients:
#            Estimate Std. Error t value Pr(>|t|)   
#(Intercept)  0.24279    0.30406   0.798  0.50834   
#MetroB      -0.10207    0.38858  -0.263  0.81739   
#MetroC      -0.06696    0.39471  -0.170  0.88090   
#MetroD       0.06804    0.41243   0.165  0.88413   
#MrktRgn      0.70787    0.06747  10.491  0.00896 **
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

#Residual standard error: 0.3039 on 2 degrees of freedom
#Multiple R-squared:  0.9857,   Adjusted R-squared:  0.9571 
#F-statistic: 34.45 on 4 and 2 DF,  p-value: 0.02841

The data used to fit the model need be re-thought. What is the goal of the analysis? What data are needed to achieve the goal?

Upvotes: 1

Related Questions