Reputation: 479
I'm finding this problem hard to search about because the terms summarize, groupby, rearrange, table are just so generic.
What I'd like to do is summarize a value after grouping by exactly two factors, and put the result in a table with rows/columns as factor1/factor2. This is a special case of groupby-exactly-two with one value per cell, but I find myself wanting this pattern a lot.
Here's a toy data set where factor "isx" has two levels and factor "grp" has three levels:
library(dplyr)
df <- tibble(
isx = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE),
grp = c('a', 'b', 'c', 'a', 'b', 'c',
'a', 'b', 'c', 'a', 'b', 'c'),
val = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
)
df$isx <- factor(df$isx)
df$grp <- factor(df$grp)
df %>%
group_by(isx, grp) %>%
summarize(mean(val))
dplyr is great, I can easily find the summary of the value for each unique combination. But the output is a 6 row table with one row per 2x3=6 combinations of the factors.
## # A tibble: 6 x 3
## # Groups: isx [2]
## isx grp `mean(val)`
## <fct> <fct> <dbl>
## 1 FALSE a 8.5
## 2 FALSE b 9.5
## 3 FALSE c 10.5
## 4 TRUE a 2.5
## 5 TRUE b 3.5
## 6 TRUE c 4.5
In the special case of exactly two factors, where I've just grouped by both of them, I'd love to transform that result into a more readable tiny table. Something like this:
# grp
# a b c
# isX F 8.5 9.5 10.5
# T 2.5 3.5 4.5
Last thought, I'm sure I can with effort look up bits of code to literally rebuild the output tibble into this matrix that I want. But this feels like a situation where R can probably do it with one call, if I could figure out how to phrase specifically what I'm looking for.
Upvotes: 1
Views: 55
Reputation: 886938
We can use pivot_wider
with values_fn
as mean
to do the reshape as well as aggregation in a single step
library(dplyr)
library(tidyr)
df %>%
pivot_wider(names_from = grp, values_from = val, values_fn = mean)
-output
# A tibble: 2 x 4
# isx a b c
# <fct> <dbl> <dbl> <dbl>
#1 TRUE 2.5 3.5 4.5
#2 FALSE 8.5 9.5 10.5
which is similar to dcast
functionality (data.table/reshape2
)
library(data.table)
dcast(df, isx ~ grp, value.var = 'val', mean)
Or using xtabs
in base R
with aggregate
xtabs(val ~ isx + grp, aggregate(val ~ isx + grp, df, mean))
# grp
#isx a b c
# FALSE 8.5 9.5 10.5
# TRUE 2.5 3.5 4.5
Or with tapply
from base R
with(df, tapply(val, list(isx, grp), mean))
# a b c
#FALSE 8.5 9.5 10.5
#TRUE 2.5 3.5 4.5
Upvotes: 4