dplyr and aggregation with summarise; a simple way to get mean at diffrent levels of aggregation

Question

I am interested in the total mean, and the mean within different conditions, of some measurements preferably using dplyr's summarise function.

I'll illustrate my question in the following. Say I have some data, borrowed form this this,

dta <- read.table(header=TRUE, text='
 subject sex condition measurement
       1   M   control         7.9
       1   M     cond1        12.3
       1   M     cond2        10.7
       2   F   control         6.3
       2   F     cond1        10.6
       2   F     cond2        11.1
       3   F   control         9.5
       3   F     cond1        13.1
       3   F     cond2        13.8
       4   M   control        11.5
       4   M     cond1        13.4
       4   M     cond2        12.9
') # ; dta

I now want the mean for each sex and the mean by sex for each condition. I know how to get it for each condition, like this.

# install.packages(c("dplyr"), dependencies = TRUE)
library(dplyr) 
dta %>% 
  group_by(sex, condition) %>%
  summarise(
    mean = mean(measurement)
)
#> # A tibble: 6 x 3
#> # Groups:   sex [?]
#>      sex condition  mean
#>        
#> 1      F     cond1 11.85
#> 2      F     cond2 12.45
#> 3      F   control  7.90
#> 4      M     cond1 12.85
#> 5      M     cond2 11.80
#> 6      M   control  9.70

But, this does not give me the aggregate mean for both sexes. To get this I either have to run a separate call, i.e.

dta %>% 
  group_by(sex) %>%
  summarise(
    mean = mean(measurement)
)
#> # A tibble: 2 x 2
#>      sex     mean
#>       
#> 1      F 10.73333
#> 2      M 11.45000

or deconstruct data structure. Like this,

# install.packages(c("tidyr"), dependencies = TRUE)
library(tidyr)
dta_wide <- spread(dta, condition, measurement)

dta_wide %>% 
  group_by(sex) %>%
  summarise(
    mean_tot = mean(cond1 + cond2 + control)/3,
    mean_cond1 = mean(cond1),
    mean_cond2 = mean(cond2),
    mean_control = mean(control)        
)
#> # A tibble: 2 x 5
#>      sex mean_tot mean_cond1 mean_cond2 mean_control
#>                           
#> 1      F 10.73333      11.85      12.45          7.9
#> 2      M 11.45000      12.85      11.80          9.7

This gives me an output with both the over all mean by sex and the individual mean by condition.

However, both running two separate calls and deconstructing data seems unnecessarily cumbersome. Isn't there a simply way to add a categorical variable, here condition, as the by variable and at the same time keep the aggregate information, here mean by sex? Maybe I am overlooking something logical and shouldn't be messing with data like this?

dplyr and aggregation with summarise; a simple way to get mean at diffrent levels of aggregation

Answers (1)

Related Questions