Applying consecutive functions to a dataframe and outputting results of each into a table

Question

I have a large dataframe with a similar format to below (running to ~200 compounds).

+-----------+----------+------------+
| Treatment | Compound | Proportion |
+-----------+----------+------------+
| A         | wax      | 0.095      |
| A         | alcohol  | 0.077      |
| A         | ketone   | 0.066      |
| B         | wax      | 0.067      |
| B         | alcohol  | 0.071      |
| B         | ketone   | 0.073      |
| C         | wax      | 0.051      |
| C         | alcohol  | 0.019      |
| C         | ketone   | 0.07       |
| D         | wax      | 0.033      |
| D         | alcohol  | 0.082      |
| D         | ketone   | 0.019      |
+-----------+----------+------------+

I have run anova on a linear model

lm(Proportion ~ Treatment)

for each compound using a data.table method, and generated a list of compounds for which treatment is a significant factor to subset my data to "t.df".

I'd now like to use TukeyHSD to determine which treatments are significantly different from each other for each of these compounds. I realise TukeyHSD needs an "aov" output and that I'd need to include this in my code. I think what I want is a "tapply" method to run through my list of compounds, applying the model, doing the anova then the Tukeys test and saving the format in a list of matrices.

I've been trying to play around with something like the following, but without success:

mytest <- function(x) { 
  model<-lm(Proportion ~ Treatment, data=t.df)
  aovmodel<-aov(model)
  tuks<-TukeyHSD(aovmodel) 
  } 
tapply((t.df[unique(t.df$Compound)]),mytest)

This returns the error:

"Error in `[.data.frame`(t.df, unique(t.df$Compound)) : 
  undefined columns selected"

which I think is probably the least of my problems with this piece of code.

Is there any way to extract the returned Tukey's "p adj" values for each compound tested? I'm keen to avoid doing this the long way because I have a large number of compounds in my list, and anticipate running a similar analysis with different compound names on several future datasets.

Alex A. · Accepted Answer

To get the Tukey HSD for each compound as you've specified, try this:

lapply(unique(t.df$Compound),
       function(x, df)
           TukeyHSD(aov(glm(Proportion ~ Treatment,
                            data = df,
                            subset = Compound == x)))[[1]],
       df = t.df)

For each unique compound, this calls TukeyHSD() on an ANOVA for a general linear model fit on the subset of the data corresponding to the compound. It returns a list where each element corresponds to a compound.

Applying consecutive functions to a dataframe and outputting results of each into a table

Answers (1)

Related Questions