Reputation: 7958
I am interested in the total mean, and the mean within different conditions, of some measurements preferably using dplyr
's summarise
function.
I'll illustrate my question in the following. Say I have some data, borrowed form this this,
dta <- read.table(header=TRUE, text='
subject sex condition measurement
1 M control 7.9
1 M cond1 12.3
1 M cond2 10.7
2 F control 6.3
2 F cond1 10.6
2 F cond2 11.1
3 F control 9.5
3 F cond1 13.1
3 F cond2 13.8
4 M control 11.5
4 M cond1 13.4
4 M cond2 12.9
') # ; dta
I now want the mean for each sex
and the mean by sex
for each condition
. I know how to get it for each condition
, like this.
# install.packages(c("dplyr"), dependencies = TRUE)
library(dplyr)
dta %>%
group_by(sex, condition) %>%
summarise(
mean = mean(measurement)
)
#> # A tibble: 6 x 3
#> # Groups: sex [?]
#> sex condition mean
#> <fctr> <fctr> <dbl>
#> 1 F cond1 11.85
#> 2 F cond2 12.45
#> 3 F control 7.90
#> 4 M cond1 12.85
#> 5 M cond2 11.80
#> 6 M control 9.70
But, this does not give me the aggregate mean for both sexes. To get this I either have to run a separate call, i.e.
dta %>%
group_by(sex) %>%
summarise(
mean = mean(measurement)
)
#> # A tibble: 2 x 2
#> sex mean
#> <fctr> <dbl>
#> 1 F 10.73333
#> 2 M 11.45000
or deconstruct data structure. Like this,
# install.packages(c("tidyr"), dependencies = TRUE)
library(tidyr)
dta_wide <- spread(dta, condition, measurement)
dta_wide %>%
group_by(sex) %>%
summarise(
mean_tot = mean(cond1 + cond2 + control)/3,
mean_cond1 = mean(cond1),
mean_cond2 = mean(cond2),
mean_control = mean(control)
)
#> # A tibble: 2 x 5
#> sex mean_tot mean_cond1 mean_cond2 mean_control
#> <fctr> <dbl> <dbl> <dbl> <dbl>
#> 1 F 10.73333 11.85 12.45 7.9
#> 2 M 11.45000 12.85 11.80 9.7
This gives me an output with both the over all mean by sex
and the individual mean by condition
.
However, both running two separate calls and deconstructing data seems unnecessarily cumbersome. Isn't there a simply way to add a categorical variable, here condition
, as the by
variable and at the same time keep the aggregate information, here mean by sex
? Maybe I am overlooking something logical and shouldn't be messing with data like this?
Upvotes: 2
Views: 1204
Reputation: 215137
One option is to calculate the two summaries separately, then join back:
dta %>%
group_by(sex, condition) %>%
summarise(mean = mean(measurement)) %>%
inner_join(
group_by(dta, sex) %>%
summarise(mean_tot = mean(measurement))
)
# Joining, by = "sex"
# A tibble: 6 x 4
# Groups: sex [?]
# sex condition mean mean_tot
# <fctr> <fctr> <dbl> <dbl>
#1 F cond1 11.85 10.73333
#2 F cond2 12.45 10.73333
#3 F control 7.90 10.73333
#4 M cond1 12.85 11.45000
#5 M cond2 11.80 11.45000
#6 M control 9.70 11.45000
Or use group_by
twice:
dta %>%
group_by(sex, condition) %>%
summarise(s = sum(measurement), n = n()) %>%
group_by(sex) %>%
transmute(condition, mean_tot = sum(s) / sum(n), mean = s / n)
# Adding missing grouping variables: `sex`
# A tibble: 6 x 4
# Groups: sex [2]
# sex condition mean_tot mean
# <fctr> <fctr> <dbl> <dbl>
#1 F cond1 10.73333 11.85
#2 F cond2 10.73333 12.45
#3 F control 10.73333 7.90
#4 M cond1 11.45000 12.85
#5 M cond2 11.45000 11.80
#6 M control 11.45000 9.70
Upvotes: 2