How to make regression based on grouped rows and loop over columns?

Question

What I want to do is to perform a regression loop that always has the same predictor but loops over responses (here: y1, y2 and y3). The problem is that I want it also to be done for each category of a grouping variable. In the example data below, I want to make the regression y_i=x for all three y variables, which would result in three regressions. But I want this to be done separately for group=a, group=b and group=c, resulting in 9 different regressions (preferably stored as lists). Cant figure out how to do it! Anyone who has an idea on how to do this?

My idea so far was to maybe do a for-loop or lapply combined with dplyr::group_by, but can't get it to work.

Example data (I have a much larger data set for the actual analysis).

set.seed(123)
dat <- data.frame(group=c(rep("a",10), rep("b",10), rep("c",10)), 
                   x=rnorm(30), y1=rnorm(30), y2=rnorm(30), y3=rnorm(30))

G. Grothendieck · Accepted Answer

1) Use lmList in nlme (which comes with R so you don't have to install it).

library(nlme)
regs <- lmList(cbind(y1, y2, y3) ~ x | group, dat)

giving an lmList object having a component for each group. We show the component for group a and the other groups are similar.

> regs$a

Call:
lm(formula = object, data = dat, na.action = na.action)

Coefficients:
             y1       y2       y3     
(Intercept)   0.2943   0.1395   0.4539
x             0.3721  -0.2206  -0.2255

2) Another approach is to perform one overall lm giving an lm object having the same coefficients as above.

lm(cbind(y1, y2, y3) ~ group + x:group + 0, dat)

3) We could also use one of several list comprehension packages. This gives a list of 9 components. The names of the components identify the combination used as does the call component (shown in the Call: line of the output) within each main component. Note t hat the current CRAN version is 0.1.0 but the code below relies on listcompr 0.1.1 which can be obtained from github until it is put on CRAN.

# install.github("patrickroocks/listcompr")
library(listcompr)
packageVersion("listcompr") # need version 0.1.1 or later

regs <- gen.named.list("{y}.{g}",
  do.call("lm", 
    list(reformulate("x", y), quote(dat), subset = bquote(dat$group == .(g)))
  ), y = c("y1", "y2", "y3"), g = unique(dat$group)
)

If you don't mind that the Call: line in the output is less descriptive then it can be simplified to:

gen.named.list("{y}.{g}", lm(reformulate("x", y), dat, subset = group == g),
  y = c("y1", "y2", "y3"), g = unique(dat$group))

Note

The input corrected from question which had two y2's.

set.seed(123)
dat <- data.frame(group=c(rep("a",10), rep("b",10), rep("c",10)), 
                   x=rnorm(30), y1=rnorm(30), y2=rnorm(30), y3=rnorm(30))

How to make regression based on grouped rows and loop over columns?

Answers (1)

Note

Related Questions