Valentin_Ștefan
Valentin_Ștefan

Reputation: 6456

Regression models as column in data table, R

I am struggling to find a way how to use the power of data tables while running some regression models.

Here is a simplified working case:

# given a data table containing desired variables
MyVarb <- data.table(Y=rnorm(100),
                 V1=rnorm(100),
                 V2=rnorm(100))

# given a new data table containing a series of formulas/equations in a column
DT <- data.table(eq=c("Y ~ V1", "Y ~ V2", "Y ~ V1 + V2"))

# I store the linear regression models in a second column
DT[, "models" := lapply(eq, function(i) lm(i, data=MyVarb))]

# Now, I can access the coefficients of a model (e.g. the 3rd one) like:
DT[3, models][[1]]$coefficients
(Intercept)          V1          V2 
-0.01583034  0.08284029  0.01630247 

However, I am curious if there are alternative ways. This doesn't work as desired:

DT[, "trial" := lm(eq, data=MyVarb)]
# ***sorry for my bad understanding of data tables and objects***

I am curious and I want to run thousands of models and there are many more variables, therefore it is time consuming using the lapply inside the data table DT (couple of hours on my PC and then I run out of the 8Gb of RAM...). Is there a way how to code it faster?

I would appreciate your kind help.

Upvotes: 3

Views: 2802

Answers (1)

Dean MacGregor
Dean MacGregor

Reputation: 18701

If you just need the coefficients, p-values and AIC then this will work while not using up a bunch of memory storing unnecessary bits of lm objects

MyVarb <- data.table(Y=rnorm(100),
                     V1=rnorm(100),
                     V2=rnorm(100))
eq=c("Y ~ V1", "Y ~ V2", "Y ~ V1 + V2")
DT<-rbindlist(lapply(eq, function(mod) {
  reg<-lm(mod, data=MyVarb)
  dt<-data.table(summary(reg)$coefficients)
  dt[,coef:=row.names(summary(reg)$coefficients)]
  dt[,aic:=AIC(reg)]
  dt[,model:=mod]


})) 

Upvotes: 3

Related Questions