Error after applying lm in loop

Question

I am trying to apply the following code and it works fine with any data without NA value. however, when I include data with NA values I receive the following message:

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases

the code I use is :

    m <- data.frame(matrix(ncol = 5, nrow = length(unique(df$Year))*length(unique(df$Firm))))
    enter code here
l = 0
for(i in unique(df$Year)) {
  for(j in unique(df$Firm)) {
    l = l + 1
    mod<-lm(Ri ~ RM + Rz, data = df, subset = df$Year==i & df$Firm ==j)
    m[l,] <- c(i,
               as.character(j), 
               mod$coefficients[2],
               mod$coefficients[3],
               summary(mod)$sigma)
  }
}
names(m) <- c("Year", "Firm", "B1", "B2","e")

this is an example about the data I am using:

Year   Firm    Ri    Rm    Rz
2009   A       30    55    NA
2009   A       0     55    NA
2009   A       1     55    NA
2010   A       7     55    85
2010   A       15    NA    85
2011   A       0     55    85
2011   A       3.5   55    85
2011   A       8     NA    85
2009   B       24    55    85
2009   B       30    55    85
2009   B       25    55    85
2010   B       5.2   NA    85
2010   B       11.8  55    85
2011   B       0     55    NA
2011   B       90    55    NA
2011   B       57    55    NA

Any Suggestions ???

Gopala · Accepted Answer

Aside from the data problem above, you can re-write your code as follows using a combination of dplyr and broom packages:

library(dplyr)
library(tidyr)
df$Rz <- 85 # Imput values of Rz to make the code work
df %>% group_by(Year, Firm) %>% do(tidy(lm(Ri ~ Rm + Rz, data = .)))

Source: local data frame [6 x 7]
Groups: Year, Firm [6]

   Year   Firm        term estimate std.error statistic     p.value
                               
1  2009      A (Intercept) 10.33333  9.837570  1.050395 0.403735888
2  2009      B (Intercept) 26.33333  1.855921 14.188819 0.004930448
3  2010      A (Intercept)  7.00000       NaN       NaN         NaN
4  2010      B (Intercept) 11.80000       NaN       NaN         NaN
5  2011      A (Intercept)  1.75000  1.750000  1.000000 0.500000000
6  2011      B (Intercept) 49.00000 26.286879  1.864048 0.203331016

UPDATE: Adding a filter option so that the groups of Year/Firm that don't have all NAs in one of the other (independent variables) can be fit using lm:

df %>% group_by(Year, Firm) %>% filter(!all(is.na(Rm)) & !all(is.na(Rz))) %>% do(tidy(lm(Ri ~ Rm + Rz, data = .)))
Source: local data frame [4 x 7]
Groups: Year, Firm [4]

   Year   Firm        term estimate std.error statistic     p.value
                               
1  2009      B (Intercept) 26.33333  1.855921  14.18882 0.004930448
2  2010      A (Intercept)  7.00000       NaN       NaN         NaN
3  2010      B (Intercept) 11.80000       NaN       NaN         NaN
4  2011      A (Intercept)  1.75000  1.750000   1.00000 0.500000000

This output shows only an intercept model fit since there is no other variability in the provided sample data. However, if you had such variability (for example on mtcars data set), you will get output as follows:

mtcars %>% group_by(cyl) %>% do(tidy(lm(mpg ~ wt + am, data = mtcars)))
Source: local data frame [9 x 6]
Groups: cyl [3]

    cyl        term    estimate std.error   statistic      p.value
                                    
1     4 (Intercept) 37.32155131 3.0546385 12.21799285 5.843477e-13
2     4          wt -5.35281145 0.7882438 -6.79080719 1.867415e-07
3     4          am -0.02361522 1.5456453 -0.01527855 9.879146e-01
4     6 (Intercept) 37.32155131 3.0546385 12.21799285 5.843477e-13
5     6          wt -5.35281145 0.7882438 -6.79080719 1.867415e-07
6     6          am -0.02361522 1.5456453 -0.01527855 9.879146e-01
7     8 (Intercept) 37.32155131 3.0546385 12.21799285 5.843477e-13
8     8          wt -5.35281145 0.7882438 -6.79080719 1.867415e-07
9     8          am -0.02361522 1.5456453 -0.01527855 9.879146e-01

EDIT: Adding a simple example that proves the problem in the original post:

x <- 1:10
y <- 1:10
z <- NA
df <- data.frame(x = x, y = y, z = z)
lm(x ~ y + z, data = df)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  0 (non-NA) cases

Error after applying lm in loop

Answers (1)

Related Questions