Anneclaire
Anneclaire

Reputation: 13

Loop a linear regression over several dependant and independant variables in a data.table and store the results

I am trying to repeat a set of linear regressions on pairs of variables inside a data table. I have three independent variables y1, y2, y3 and 10 explanatory variables x1 to x10. Some observations are missing in each series.

In the example below , I would like to repeat the second line of command for each pairs of ys and xs.

d <- data.table(country=rep(c('a','b','c'),c(10,10,10)),y1=rnorm(30),y2=rnorm(30),x1=runif(30),x2=runif(30))

d[(!is.na(y1) & !is.na(x1)), .(beta1=summary(lm(y1~x1))$coefficients[2,1],    p1=summary(lm(y1~x1))$coefficients[2,4])  ,by=country]

Upvotes: 1

Views: 318

Answers (1)

chinsoon12
chinsoon12

Reputation: 25225

Here is a more base approach. You can generate a combinations of x's and y's using data.table::CJ or expand.grid. Then go through each combination to perform your linear regression.

combi <- CJ(grep("^x", names(d), value=TRUE),grep("^y", names(d), value=TRUE)) 

lmRes <- apply(combi, 1, function(x) {
    fml <- as.formula(paste(x["V2"],"~",x["V1"]))
    lm(fml, d)
})
lmRes

Short of generating a large data set from d of all combinations of x's and y's before joining with the combinations, there is probably no simpler way to solve this problem by joining tables.

Upvotes: 1

Related Questions