Taking the mean of different treatment and rep counts in an R dataframe

Question

Below is an example dataframe with different samples, treatments and reps particularly between the control and treatments recording biomass accumulation over time. I can calculate the mean biomass of each sample, treatment and reps by subsetting it or creating a (long) list object of each sample by treatment groups, then taking the mean biomass this way by calling lapply. However, is there a simpler, or better way to do this without having to "leave the dataframe", and so requires writing less code?

set.seed(34)
df <- data.frame(
    SAMPLE = rep(c("S0","S1","S2"), times = c(4,15,15)),
    TREATMENT = c("Ctl","T1","T2","T3","Ctl","Ctl","Ctl",
                  "T1","T1","T1","T1","T2","T2","T2","T2",
                  "T3","T3","T3","T3","Ctl","Ctl","Ctl","T1",
                  "T1","T1","T1","T2","T2","T2","T2","T3",
                  "T3","T3","T3"),
    REPS = c(1,1,1,1, 1,2,3,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3, 
             1,2,3,4,1,2,3,4,1,2,3,4),
    BIOMASS = round(rnorm(34, mean = 22, sd = 5), digits = 2)
)

head(df)

Thanks, Franklin

akrun · Accepted Answer

We can use aggregate from base R

aggregate(BIOMASS~SAMPLE + TREATMENT, df, mean)

Or if is 'REPS' and 'TREATMENT' as groups

aggregate(BIOMASS~REPS + TREATMENT, df, mean)

Or with data.table

library(data.table)
setDT(df)[, .(MEAN = mean(BIOMASS)) , .(SAMPLE, TREATMENT)]

Taking the mean of different treatment and rep counts in an R dataframe

Answers (2)

Related Questions