brainupgraded
brainupgraded

Reputation: 5

runnig ANOVA for selected variables only with grep() in r

I am trying to run ANOVA for multiple outcomes selected with grep(). Below is close to what I have, but this doesn't work, of course. It seems like there is an elegant and efficient way of doing this with purrr::map() or lapply, but I cannot figure out how. Also, it would be great if the result for each variable could be stored as list(?). I think I don't fully understand data types and how they work in R, which makes me very confused now. I would appreciate your advice on the solution!

varlist <- grep("num_weeks_", names(crao2), value=TRUE)
for (i in varlist) {
   anova <- aov(i ~ treatment, data = df)
   summary(anova)
   TukeyHSD(anova)
   rm(anova)
}

Upvotes: 0

Views: 46

Answers (2)

Greg Snow
Greg Snow

Reputation: 49660

I prefer to do things like this using lists and functions like sapply or map. Rather than doing all of your steps in the loop, I would first do all the calls to aov to create an initial list, then call summary and TukeyHSD on that list.

First create the list:

varlist <- grep('p', names(mtcars), value=TRUE)

aov.list <- sapply(varlist, function(v){
  f <- reformulate('factor(gear)', v)
  aov(f, data=mtcars)
}, simplify = FALSE)

Now aov.list (or whatever you want to name it) is a list with each of the fitted objects and the names of the list are the values of varlist (this is why I use sapply with simplify = FALSE rather than lapply).

One drawback to the above is that if you look at the call element of each list it just shows f for the formula.

We can make the call look more like we did these individually by hand by substituteing and evaluating:

aov.list <- sapply(varlist, function(v){
  f <- reformulate('factor(gear)', v)
  eval(substitute(aov(f, data=mtcars), list(f=f)))
}, simplify = FALSE)

If you want to use map from the tidyverse/purrr, this does the same thing:

aov.list <- varlist |>
  set_names() |>
  map(function(v){
    f <- reformulate('factor(gear)', v)
    eval(substitute(aov(f, data=mtcars), list(f=f)))
  })

Now we can use lapply or map to do the next steps:

lapply(aov.list, summary)

aov.list |>
  map(TukeyHSD)

The above just prints the results since we did not assign them. But we could assign the results to new lists for further examination.

Upvotes: 1

DaveArmstrong
DaveArmstrong

Reputation: 22034

You're almost there. You cannot just use i because R won't substitute it the right way. You could build a formulate from that variable name with reformulate(). In the example below, I did this with mtcars data where the treatment was as.factor(am) and each variable that has a p in it being the response.

data(mtcars)
varlist <- grep("p", names(mtcars), value=TRUE)
for (i in varlist) {
  cat("#### ", i , " ####\n")
  anova <- aov(reformulate("as.factor(am)", response=i), data = mtcars)
  print(summary(anova))
  print(TukeyHSD(anova))
  rm(anova)
}
#> ####  mpg  ####
#>               Df Sum Sq Mean Sq F value   Pr(>F)    
#> as.factor(am)  1  405.2   405.2   16.86 0.000285 ***
#> Residuals     30  720.9    24.0                     
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>   Tukey multiple comparisons of means
#>     95% family-wise confidence level
#> 
#> Fit: aov(formula = reformulate("as.factor(am)", response = i), data = mtcars)
#> 
#> $`as.factor(am)`
#>         diff     lwr      upr    p adj
#> 1-0 7.244939 3.64151 10.84837 0.000285
#> 
#> ####  disp  ####
#>               Df Sum Sq Mean Sq F value   Pr(>F)    
#> as.factor(am)  1 166450  166450   16.12 0.000366 ***
#> Residuals     30 309735   10324                     
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>   Tukey multiple comparisons of means
#>     95% family-wise confidence level
#> 
#> Fit: aov(formula = reformulate("as.factor(am)", response = i), data = mtcars)
#> 
#> $`as.factor(am)`
#>          diff       lwr       upr     p adj
#> 1-0 -146.8482 -221.5402 -72.15611 0.0003662
#> 
#> ####  hp  ####
#>               Df Sum Sq Mean Sq F value Pr(>F)
#> as.factor(am)  1   8619    8619   1.886   0.18
#> Residuals     30 137107    4570               
#>   Tukey multiple comparisons of means
#>     95% family-wise confidence level
#> 
#> Fit: aov(formula = reformulate("as.factor(am)", response = i), data = mtcars)
#> 
#> $`as.factor(am)`
#>        diff       lwr      upr     p adj
#> 1-0 -33.417 -83.11169 16.27768 0.1798309

Created on 2025-01-09 with reprex v2.1.0

Upvotes: 3

Related Questions