R: function to apply Anova over different subsets of one's dataset and collect output

Question

A common task is to have to carry out a certain statistical analysis (like an anova, glm or mixed model) on different subsets of a dataset and combine the output tables with summary coefficients and p values in a single dataframe. I am looking though for a generic function that would take the type of model (e.g. aov(...) or lm(...) or glm(...) or glmer(...) ) and the particular output terms for which the coefficients and p values would have to be returned for each of the replicate analyses according to some grouping variable(s) in one's dataset.

Say if I have a dataframe in which I would like to carry out a certain analysis over the different levels of factor "replicate" in a dataframe data :

data(iris)
library(car)
data=data.frame()
for (i in 1:10) {data=rbind(data,cbind(replicate=i,iris))}

Using broom+dplyr, I could e.g. do an anova over each subsets of this dataframe (grouping by replicate) and keep the p values for term "Species" using

library(devtools)
install_github("dgrtwo/broom")
library(broom)
library(dplyr)

group_by(data, replicate) %>% do(tidy(Anova(aov(Sepal.Length ~ Species, data = .),type="III"))) %>% filter(term=="Species")

Source: local data frame [10 x 6]
Groups: replicate [10]

   replicate    term    sumsq    df statistic      p.value
       (int)   (chr)    (dbl) (dbl)     (dbl)        (dbl)
1          1 Species 189.6364     2  362.6614 2.580311e-94
2          2 Species 189.6364     2  362.6614 2.580311e-94
3          3 Species 189.6364     2  362.6614 2.580311e-94
4          4 Species 189.6364     2  362.6614 2.580311e-94
5          5 Species 189.6364     2  362.6614 2.580311e-94
6          6 Species 189.6364     2  362.6614 2.580311e-94
7          7 Species 189.6364     2  362.6614 2.580311e-94
8          8 Species 189.6364     2  362.6614 2.580311e-94
9          9 Species 189.6364     2  362.6614 2.580311e-94
10        10 Species 189.6364     2  362.6614 2.580311e-94

(I used 10 identical data subsets just as an example here)

I am looking though for a more generic function "Anovabygroup", which would take the dataframe, the grouping variable(s) (here replicate, but it could also be the combination of several grouping variables), the type of model to run (e.g. in this case 'aov(Sepal.Length ~ Species, data = .)', but it could also be a lm, glm, lme, lmer or glmer model or any other model handled by Anova()) and the factors to return coefficients and p values for (maybe with option "all" to return everything) as arguments (any other options given could be passed on to the call to Anova). Would anyone know how to do this by any chance, using code similar to that used above, but generalised to take these arguments? Main thing I don't know how to do is to pass on the model (e.g. in this case `'aov(Sepal.Length ~ Species, data = .)') as an argument and have it evaluated. Or does it perhaps already exist in some package? I think this could be useful as I always find myself coding this task over and over again...

PS I used github version of the broom package as the current CRAN version doesn't seem to handle Anova output well

R: function to apply Anova over different subsets of one's dataset and collect output

Answers (1)

Answer:

Example:

Related Questions

R: function to apply Anova over different subsets of one&#39;s dataset and collect output

Answers (1)

Answer:

Example:

Related Questions

R: function to apply Anova over different subsets of one's dataset and collect output