Reputation: 63
I have a large data frame with about 100 columns and splitted it up by year. I want to regress x[i] from the precedent year as the independent variable on x[i] the subsequent year as the dependent variable: xS = a0+ a1xP + e
My code looks like this:
d1 <- structure(list(Date=c("2012-01-01", "2012-06-01",
"2013-01-01", "2013-06-01", "2014-01-01", "2014-06-01"),
x1=c(NA, NA, 17L, 29L, 27L, 10L),
x2=c(30L, 19L, 22L, 20L, 11L,24L),
x3=c(NA, 23L, 22L, 27L, 21L, 26L),
x4=c(30L, 28L, 23L,24L, 10L, 17L),
x5=c(NA, NA, NA, 16L, 30L, 26L)),
row.names=c(NA, 6L), class="data.frame")
rownames(d1) <- d1[, "Date"]
d1 <- d1[,-1]
df2012 <- d1[1:2,]
df2013 <- d1[3:4,]
df2014 <- d1[4:5,]
condlm <- function(i){
if(sum(is.na(df2012[,i]))==dim(df2013)[1]) # ignore the columns only containing NA's
return()
else
lm.model <- lm(df2013[,i]~df2012[,i])
summary(lm.model)
}
lms <- lapply(1:dim(df2013)[2], condlm)
lms
zzq <- sapply(lms, coef)
zzq <- do.call(rbind.data.frame, zzq)
zzq <- zzq[grepl("(Intercept)", rownames(zzq)) ,]
EDIT 2:
lms
gives me following Output:
[[1]]
NULL
[[2]]
Call:
lm(formula = df2013[, i] ~ df2012[, i])
Residuals:
ALL 2 residuals are 0: no residual degrees of freedom!
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.5455 NA NA NA
df2012[, i] 0.1818 NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 1 and 0 DF, p-value: NA
[[3]]
Call:
lm(formula = df2013[, i] ~ df2012[, i])
Residuals:
ALL 1 residuals are 0: no residual degrees of freedom!
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 27 NA NA NA
df2012[, i] NA NA NA NA
Residual standard error: NaN on 0 degrees of freedom
(1 observation deleted due to missingness)
[[4]]
Call:
lm(formula = df2013[, i] ~ df2012[, i])
Residuals:
ALL 2 residuals are 0: no residual degrees of freedom!
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 38.0 NA NA NA
df2012[, i] -0.5 NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 1 and 0 DF, p-value: NA
[[5]]
NULL
[[1]]
and [[5]]
gives me NULL
.
Is there a way to modify the function condlm, that gives me a NA instead of NULL
?
In the End, after extracting the intercepts with zzq <- zzq[grepl("(Intercept)", rownames(zzq)) ,]
my Data frame zzq should look like this:
Estimate Std. Error t value Pr(>|t|)
(Intercept) NA NaN NaN NaN
(Intercept)2 16.54545 NaN NaN NaN
(Intercept)3 27.00000 NaN NaN NaN
(Intercept)4 38.00000 NaN NaN NaN
(Intercept)5 NA NaN NaN NaN
Thanks
Upvotes: 0
Views: 1157
Reputation: 1999
You can get the std error, p-values, etc. with the following modifications:
condlm <- function(i){
if(sum(is.na(df2012[,i]))==dim(df2013)[1]) # ignore the columns only containing NA's
return()
else
lm.model <- lm(df2013[,i]~df2012[,i])
summary(lm.model)
}
lms <- lapply(1:dim(df2013)[2], condlm)
lms
However please note that due to the way that your data is currently structured in your example, you do not have sufficient data to obtain numeric values for std. error, etc. since you are under-fitting your model.
For example, with your sample data we will get the following (partial output)
> lms
[[1]]
NULL
[[2]]
Call:
lm(formula = df2013[, i] ~ df2012[, i])
Residuals:
ALL 2 residuals are 0: no residual degrees of freedom!
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.5455 NA NA NA
df2012[, i] 0.1818 NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 1 and 0 DF, p-value: NA
Upvotes: 1