Linear regression on subsets with dependent variable per column using dlply() in R

Question

I would like to automatically produce linear regressions for a data frame for each category separately.

My data frame includes one column with time categories, one column (slope$Abs) as the dependent variable, several columns, which should be used as the independent variable.

head(slope)
   timepoint   Abs      In1      In2      In3     Out1     Out2     Out3 ...
1:        t0 275.0 2.169214 2.169214 2.169214 2.069684 2.069684 2.069684
2:        t0 275.5 2.163937 2.163937 2.163937 2.063853 2.063853 2.063853
3:        t0 276.0 2.153298 2.158632 2.153298 2.052088 2.052088 2.057988
4: ...

All in all for each timepoint I have 40 variables, and I want to end up with a linear regression for each combination. Such as In1~Abs[t0], In1~Abs[t1] and so on for each column. Of course I can do this manually, but I guess there must be a more elegant way to do the work.

I did my research and found out that dlply() might be the function I'm looking for. However, my attempt results in an error.

So I somehow tried to combine the answers from previous questions I have found: On individual variables per column and on subsets per category

I came up with a function like this:

lm.fun <- function(x) {summary(lm(x ~ slope$Abs, data=slope))}
lm.list <- dlply(.data=slope, .variables=slope$timepoint, .fun=lm.fun )

But I get the following error:

Error in eval.quoted(.variables, data) : 
   envir must be either NULL, a list, or an environment.

Hope someone can help me out.

Thanks a lot in advance!

neko · Accepted Answer

I have solved the issue with a simpler approach, so I wanted to update the answer.

To make life easier I converted the data frame structure so that all columns are converted into rows with the melt() function of the reshape package.

melt(slope, id = c("Abs", "timepoint"), variable_name = "Sites")

The output's column name is by default "value".

Then create one column that adds both predictors with paste().

slope$FullTreat <- paste(slope$Sites,slope$timepoint, sep="_")

Run a function through the dataset to create separate models for each treatment combination.

models <- dlply(slope, ~ FullTreat, function(df) { 
          lm(value ~ Abs, data = df)
          })

To extract the coefficents simply run

coefs <- ldply(models, coef)

Then split the FullTreat column into separate columns again with colsplit() also from reshape. Plus, add the Intercept and slope to the new data frame:

coefs <- cbind(colsplit(coefs$FullTreat, split="_",
         c("Sites","Timepoint")), coefs[,2:3])

I haven't worked on a function that plots all the regressions from the models, but I guess this is feasible with the ldply() function.

Linear regression on subsets with dependent variable per column using dlply() in R

Answers (2)

Related Questions