Rearrange dplyr groupby output with exactly two factors?

Question

I'm finding this problem hard to search about because the terms summarize, groupby, rearrange, table are just so generic.

What I'd like to do is summarize a value after grouping by exactly two factors, and put the result in a table with rows/columns as factor1/factor2. This is a special case of groupby-exactly-two with one value per cell, but I find myself wanting this pattern a lot.

Here's a toy data set where factor "isx" has two levels and factor "grp" has three levels:

library(dplyr)

df <- tibble(
  isx = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
          FALSE, FALSE, FALSE, FALSE, FALSE, FALSE),
  grp = c('a', 'b', 'c', 'a', 'b', 'c',
          'a', 'b', 'c', 'a', 'b', 'c'),
  val = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
)
df$isx <- factor(df$isx)
df$grp <- factor(df$grp)

df %>%
  group_by(isx, grp) %>%
  summarize(mean(val))

dplyr is great, I can easily find the summary of the value for each unique combination. But the output is a 6 row table with one row per 2x3=6 combinations of the factors.

## # A tibble: 6 x 3
## # Groups:   isx [2]
##   isx   grp   `mean(val)`
##           
## 1 FALSE a             8.5
## 2 FALSE b             9.5
## 3 FALSE c            10.5
## 4 TRUE  a             2.5
## 5 TRUE  b             3.5
## 6 TRUE  c             4.5

In the special case of exactly two factors, where I've just grouped by both of them, I'd love to transform that result into a more readable tiny table. Something like this:

#            grp 
#          a    b    c
#  isX  F  8.5  9.5  10.5
#       T  2.5  3.5   4.5

Last thought, I'm sure I can with effort look up bits of code to literally rebuild the output tibble into this matrix that I want. But this feels like a situation where R can probably do it with one call, if I could figure out how to phrase specifically what I'm looking for.

akrun · Accepted Answer

We can use pivot_wider with values_fn as mean to do the reshape as well as aggregation in a single step

library(dplyr)
library(tidyr)
df %>% 
   pivot_wider(names_from = grp, values_from =  val, values_fn = mean)

-output

# A tibble: 2 x 4
#  isx       a     b     c
#      
#1 TRUE    2.5   3.5   4.5
#2 FALSE   8.5   9.5  10.5

which is similar to dcast functionality (data.table/reshape2)

library(data.table)
dcast(df, isx ~ grp, value.var = 'val', mean)

Or using xtabs in base R with aggregate

xtabs(val ~ isx + grp, aggregate(val ~ isx + grp, df, mean))
#       grp
#isx        a    b    c
#  FALSE  8.5  9.5 10.5
#  TRUE   2.5  3.5  4.5

Or with tapply from base R

with(df, tapply(val, list(isx, grp), mean))
#       a   b    c
#FALSE 8.5 9.5 10.5
#TRUE  2.5 3.5  4.5

Rearrange dplyr groupby output with exactly two factors?

Answers (1)

Related Questions