Spencer
Spencer

Reputation: 870

Automated formula construction

In a recent homework assignment, we were instructed to run 27 linear models, each time adding an additional variable (the goal was to plot the changes in R2 vs. changes in adjusted R2). I found it difficult to algorithmically create formulas like this. The code I ended up using looked like this following (note that the first column in the data frame is the dependent variable, all the rest are prospective independent variables.

 make.formula <- function(howfar) {
  formula <- c()
  for (i in 1:howfar) {
    if (i == 1) {
      formula <- paste(formula, names(d)[i], '~')}
    else if (i == howfar) {
      formula <- paste(formula, names(d)[i], '')
    }
    else {
      formula <- paste(formula, names(d)[i], '+')}
  }
  return(formula)
}

formulas <- lapply(seq(2, length(d)), make.formula)
formulas <- lapply(formulas, as.formula)
fits <- lapply(formulas, lm, data = d)

This works, but seems far from ideal, and my impression is that anything I'm doing with a for-loop in R is probably not being done the best way. Is there an easier way to algorithmically construct formulas for a given data frame?

Upvotes: 5

Views: 997

Answers (2)

Metrics
Metrics

Reputation: 15458

Map can be used to solve that problem:

mydata<-mtcars

dep<-as.list(rep("mpg~",(dim(mydata)[2]-1))) # ldependent variables with ~ 
indep1<- as.list( names(mydata)[-1])
indeno<-as.list(1:(dim(mydata)[2]-1))

myreg<-Map(function(x,y) (lm(as.formula(paste(x,paste(unlist(indep[1:y]),collapse="+"))),data=mtcars))$coefficient,dep,indeno)
   > myreg
[[1]]
(Intercept)         cyl 
   37.88458    -2.87579 

[[2]]
(Intercept)         cyl        disp 
34.66099474 -1.58727681 -0.02058363 

[[3]]
(Intercept)         cyl        disp          hp 
34.18491917 -1.22741994 -0.01883809 -0.01467933 

[[4]]
(Intercept)         cyl        disp          hp        drat 
23.98524441 -0.81402201 -0.01389625 -0.02317068  2.15404553 

[[5]]
(Intercept)         cyl        disp          hp        drat          wt 
36.00835689 -1.10748650  0.01235733 -0.02401743  0.95220742 -3.67328708 

[[6]]
(Intercept)         cyl        disp          hp        drat          wt        qsec 
26.30735899 -0.81856023  0.01320490 -0.01792993  1.32040573 -4.19083238  0.40146117 

[[7]]
(Intercept)         cyl        disp          hp        drat          wt        qsec          vs 
25.88354175 -0.85665309  0.01314097 -0.01733070  1.31265550 -4.22434351  0.44873351 -0.27816899 

[[8]]
(Intercept)         cyl        disp          hp        drat          wt        qsec          vs          am 
15.57313068 -0.27859352  0.01471012 -0.02144242  0.81505862 -3.94373934  0.80975689  0.36835866  2.79374984 

[[9]]
(Intercept)         cyl        disp          hp        drat          wt        qsec          vs          am 
12.83083549 -0.16881263  0.01623358 -0.02424055  0.70590083 -4.03214213  0.86828517  0.36470431  2.55092849 
       gear 
 0.50293618 

[[10]]
(Intercept)         cyl        disp          hp        drat          wt        qsec          vs          am 
12.30337416 -0.11144048  0.01333524 -0.02148212  0.78711097 -3.71530393  0.82104075  0.31776281  2.52022689 
       gear        carb 
 0.65541302 -0.19941925 

Upvotes: 2

Josh O&#39;Brien
Josh O&#39;Brien

Reputation: 162321

reformulate(), a nifty function for creating formulas from character vectors, might come in handy. Here's an example of what it does:

reformulate(response="Y", termlabels=c("X1", "X2", "X3"))
# Y ~ X1 + X2 + X3

And here's how you might use it in practice. (Note that I here create the formulas inside of the lm() calls. Because formula objects carry with them info about the environment they were created in, I'd be a bit hesitant to create them outside of the lm() call within which you actually want to use them.):

evars <- names(mtcars)[2:5]
ii <- lapply(1:4, seq_len)

lapply(ii, 
       function(X) {
          coef(lm(reformulate(response="mpg", termlabels=evars[X]), data=mtcars))
})
# [[1]]
# (Intercept)         cyl 
#    37.88458    -2.87579 
# 
# [[2]]
# (Intercept)         cyl        disp 
# 34.66099474 -1.58727681 -0.02058363 
# 
# [[3]]
# (Intercept)         cyl        disp          hp 
# 34.18491917 -1.22741994 -0.01883809 -0.01467933 
# 
# [[4]]
# (Intercept)         cyl        disp          hp        drat 
# 23.98524441 -0.81402201 -0.01389625 -0.02317068  2.15404553 

Upvotes: 10

Related Questions