runnig ANOVA for selected variables only with grep() in r

Question

I am trying to run ANOVA for multiple outcomes selected with grep(). Below is close to what I have, but this doesn't work, of course. It seems like there is an elegant and efficient way of doing this with purrr::map() or lapply, but I cannot figure out how. Also, it would be great if the result for each variable could be stored as list(?). I think I don't fully understand data types and how they work in R, which makes me very confused now. I would appreciate your advice on the solution!

varlist <- grep("num_weeks_", names(crao2), value=TRUE)
for (i in varlist) {
   anova <- aov(i ~ treatment, data = df)
   summary(anova)
   TukeyHSD(anova)
   rm(anova)
}

Greg Snow · Accepted Answer

I prefer to do things like this using lists and functions like sapply or map. Rather than doing all of your steps in the loop, I would first do all the calls to aov to create an initial list, then call summary and TukeyHSD on that list.

First create the list:

varlist <- grep('p', names(mtcars), value=TRUE)

aov.list <- sapply(varlist, function(v){
  f <- reformulate('factor(gear)', v)
  aov(f, data=mtcars)
}, simplify = FALSE)

Now aov.list (or whatever you want to name it) is a list with each of the fitted objects and the names of the list are the values of varlist (this is why I use sapply with simplify = FALSE rather than lapply).

One drawback to the above is that if you look at the call element of each list it just shows f for the formula.

We can make the call look more like we did these individually by hand by substituteing and evaluating:

aov.list <- sapply(varlist, function(v){
  f <- reformulate('factor(gear)', v)
  eval(substitute(aov(f, data=mtcars), list(f=f)))
}, simplify = FALSE)

If you want to use map from the tidyverse/purrr, this does the same thing:

aov.list <- varlist |>
  set_names() |>
  map(function(v){
    f <- reformulate('factor(gear)', v)
    eval(substitute(aov(f, data=mtcars), list(f=f)))
  })

Now we can use lapply or map to do the next steps:

lapply(aov.list, summary)

aov.list |>
  map(TukeyHSD)

The above just prints the results since we did not assign them. But we could assign the results to new lists for further examination.

runnig ANOVA for selected variables only with grep() in r

Answers (2)

Related Questions