Reputation: 5
I am trying to run ANOVA for multiple outcomes selected with grep(). Below is close to what I have, but this doesn't work, of course. It seems like there is an elegant and efficient way of doing this with purrr::map() or lapply, but I cannot figure out how. Also, it would be great if the result for each variable could be stored as list(?). I think I don't fully understand data types and how they work in R, which makes me very confused now. I would appreciate your advice on the solution!
varlist <- grep("num_weeks_", names(crao2), value=TRUE)
for (i in varlist) {
anova <- aov(i ~ treatment, data = df)
summary(anova)
TukeyHSD(anova)
rm(anova)
}
Upvotes: 0
Views: 46
Reputation: 49660
I prefer to do things like this using lists and functions like sapply
or map
. Rather than doing all of your steps in the loop, I would first do all the calls to aov
to create an initial list, then call summary
and TukeyHSD
on that list.
First create the list:
varlist <- grep('p', names(mtcars), value=TRUE)
aov.list <- sapply(varlist, function(v){
f <- reformulate('factor(gear)', v)
aov(f, data=mtcars)
}, simplify = FALSE)
Now aov.list
(or whatever you want to name it) is a list with each of the fitted objects and the names of the list are the values of varlist (this is why I use sapply
with simplify = FALSE
rather than lapply
).
One drawback to the above is that if you look at the call
element of each list it just shows f
for the formula.
We can make the call
look more like we did these individually by hand by substitute
ing and eval
uating:
aov.list <- sapply(varlist, function(v){
f <- reformulate('factor(gear)', v)
eval(substitute(aov(f, data=mtcars), list(f=f)))
}, simplify = FALSE)
If you want to use map
from the tidyverse/purrr, this does the same thing:
aov.list <- varlist |>
set_names() |>
map(function(v){
f <- reformulate('factor(gear)', v)
eval(substitute(aov(f, data=mtcars), list(f=f)))
})
Now we can use lapply
or map
to do the next steps:
lapply(aov.list, summary)
aov.list |>
map(TukeyHSD)
The above just prints the results since we did not assign them. But we could assign the results to new lists for further examination.
Upvotes: 1
Reputation: 22034
You're almost there. You cannot just use i
because R won't substitute it the right way. You could build a formulate from that variable name with reformulate()
. In the example below, I did this with mtcars
data where the treatment was as.factor(am)
and each variable that has a p
in it being the response.
data(mtcars)
varlist <- grep("p", names(mtcars), value=TRUE)
for (i in varlist) {
cat("#### ", i , " ####\n")
anova <- aov(reformulate("as.factor(am)", response=i), data = mtcars)
print(summary(anova))
print(TukeyHSD(anova))
rm(anova)
}
#> #### mpg ####
#> Df Sum Sq Mean Sq F value Pr(>F)
#> as.factor(am) 1 405.2 405.2 16.86 0.000285 ***
#> Residuals 30 720.9 24.0
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Tukey multiple comparisons of means
#> 95% family-wise confidence level
#>
#> Fit: aov(formula = reformulate("as.factor(am)", response = i), data = mtcars)
#>
#> $`as.factor(am)`
#> diff lwr upr p adj
#> 1-0 7.244939 3.64151 10.84837 0.000285
#>
#> #### disp ####
#> Df Sum Sq Mean Sq F value Pr(>F)
#> as.factor(am) 1 166450 166450 16.12 0.000366 ***
#> Residuals 30 309735 10324
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Tukey multiple comparisons of means
#> 95% family-wise confidence level
#>
#> Fit: aov(formula = reformulate("as.factor(am)", response = i), data = mtcars)
#>
#> $`as.factor(am)`
#> diff lwr upr p adj
#> 1-0 -146.8482 -221.5402 -72.15611 0.0003662
#>
#> #### hp ####
#> Df Sum Sq Mean Sq F value Pr(>F)
#> as.factor(am) 1 8619 8619 1.886 0.18
#> Residuals 30 137107 4570
#> Tukey multiple comparisons of means
#> 95% family-wise confidence level
#>
#> Fit: aov(formula = reformulate("as.factor(am)", response = i), data = mtcars)
#>
#> $`as.factor(am)`
#> diff lwr upr p adj
#> 1-0 -33.417 -83.11169 16.27768 0.1798309
Created on 2025-01-09 with reprex v2.1.0
Upvotes: 3