Reputation: 6456
I am struggling to find a way how to use the power of data tables while running some regression models.
Here is a simplified working case:
# given a data table containing desired variables
MyVarb <- data.table(Y=rnorm(100),
V1=rnorm(100),
V2=rnorm(100))
# given a new data table containing a series of formulas/equations in a column
DT <- data.table(eq=c("Y ~ V1", "Y ~ V2", "Y ~ V1 + V2"))
# I store the linear regression models in a second column
DT[, "models" := lapply(eq, function(i) lm(i, data=MyVarb))]
# Now, I can access the coefficients of a model (e.g. the 3rd one) like:
DT[3, models][[1]]$coefficients
(Intercept) V1 V2
-0.01583034 0.08284029 0.01630247
However, I am curious if there are alternative ways. This doesn't work as desired:
DT[, "trial" := lm(eq, data=MyVarb)]
# ***sorry for my bad understanding of data tables and objects***
I am curious and I want to run thousands of models and there are many more variables, therefore it is time consuming using the lapply
inside the data table DT
(couple of hours on my PC and then I run out of the 8Gb of RAM...). Is there a way how to code it faster?
I would appreciate your kind help.
Upvotes: 3
Views: 2802
Reputation: 18701
If you just need the coefficients, p-values and AIC then this will work while not using up a bunch of memory storing unnecessary bits of lm objects
MyVarb <- data.table(Y=rnorm(100),
V1=rnorm(100),
V2=rnorm(100))
eq=c("Y ~ V1", "Y ~ V2", "Y ~ V1 + V2")
DT<-rbindlist(lapply(eq, function(mod) {
reg<-lm(mod, data=MyVarb)
dt<-data.table(summary(reg)$coefficients)
dt[,coef:=row.names(summary(reg)$coefficients)]
dt[,aic:=AIC(reg)]
dt[,model:=mod]
}))
Upvotes: 3