Reputation: 1447
I am trying to apply the following code and it works fine with any data without NA value. however, when I include data with NA values I receive the following message:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
the code I use is :
m <- data.frame(matrix(ncol = 5, nrow = length(unique(df$Year))*length(unique(df$Firm))))
enter code here
l = 0
for(i in unique(df$Year)) {
for(j in unique(df$Firm)) {
l = l + 1
mod<-lm(Ri ~ RM + Rz, data = df, subset = df$Year==i & df$Firm ==j)
m[l,] <- c(i,
as.character(j),
mod$coefficients[2],
mod$coefficients[3],
summary(mod)$sigma)
}
}
names(m) <- c("Year", "Firm", "B1", "B2","e")
this is an example about the data I am using:
Year Firm Ri Rm Rz
2009 A 30 55 NA
2009 A 0 55 NA
2009 A 1 55 NA
2010 A 7 55 85
2010 A 15 NA 85
2011 A 0 55 85
2011 A 3.5 55 85
2011 A 8 NA 85
2009 B 24 55 85
2009 B 30 55 85
2009 B 25 55 85
2010 B 5.2 NA 85
2010 B 11.8 55 85
2011 B 0 55 NA
2011 B 90 55 NA
2011 B 57 55 NA
Any Suggestions ???
Upvotes: 1
Views: 124
Reputation: 10483
Aside from the data problem above, you can re-write your code as follows using a combination of dplyr
and broom
packages:
library(dplyr)
library(tidyr)
df$Rz <- 85 # Imput values of Rz to make the code work
df %>% group_by(Year, Firm) %>% do(tidy(lm(Ri ~ Rm + Rz, data = .)))
Source: local data frame [6 x 7]
Groups: Year, Firm [6]
Year Firm term estimate std.error statistic p.value
<int> <fctr> <chr> <dbl> <dbl> <dbl> <dbl>
1 2009 A (Intercept) 10.33333 9.837570 1.050395 0.403735888
2 2009 B (Intercept) 26.33333 1.855921 14.188819 0.004930448
3 2010 A (Intercept) 7.00000 NaN NaN NaN
4 2010 B (Intercept) 11.80000 NaN NaN NaN
5 2011 A (Intercept) 1.75000 1.750000 1.000000 0.500000000
6 2011 B (Intercept) 49.00000 26.286879 1.864048 0.203331016
UPDATE: Adding a filter option so that the groups of Year/Firm that don't have all NAs in one of the other (independent variables) can be fit using lm
:
df %>% group_by(Year, Firm) %>% filter(!all(is.na(Rm)) & !all(is.na(Rz))) %>% do(tidy(lm(Ri ~ Rm + Rz, data = .)))
Source: local data frame [4 x 7]
Groups: Year, Firm [4]
Year Firm term estimate std.error statistic p.value
<int> <fctr> <chr> <dbl> <dbl> <dbl> <dbl>
1 2009 B (Intercept) 26.33333 1.855921 14.18882 0.004930448
2 2010 A (Intercept) 7.00000 NaN NaN NaN
3 2010 B (Intercept) 11.80000 NaN NaN NaN
4 2011 A (Intercept) 1.75000 1.750000 1.00000 0.500000000
This output shows only an intercept model fit since there is no other variability in the provided sample data. However, if you had such variability (for example on mtcars
data set), you will get output as follows:
mtcars %>% group_by(cyl) %>% do(tidy(lm(mpg ~ wt + am, data = mtcars)))
Source: local data frame [9 x 6]
Groups: cyl [3]
cyl term estimate std.error statistic p.value
<dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 4 (Intercept) 37.32155131 3.0546385 12.21799285 5.843477e-13
2 4 wt -5.35281145 0.7882438 -6.79080719 1.867415e-07
3 4 am -0.02361522 1.5456453 -0.01527855 9.879146e-01
4 6 (Intercept) 37.32155131 3.0546385 12.21799285 5.843477e-13
5 6 wt -5.35281145 0.7882438 -6.79080719 1.867415e-07
6 6 am -0.02361522 1.5456453 -0.01527855 9.879146e-01
7 8 (Intercept) 37.32155131 3.0546385 12.21799285 5.843477e-13
8 8 wt -5.35281145 0.7882438 -6.79080719 1.867415e-07
9 8 am -0.02361522 1.5456453 -0.01527855 9.879146e-01
EDIT: Adding a simple example that proves the problem in the original post:
x <- 1:10
y <- 1:10
z <- NA
df <- data.frame(x = x, y = y, z = z)
lm(x ~ y + z, data = df)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
Upvotes: 4