Reputation: 463

Group by two factors with dplyr

I am struggling a bit with the dplyr structure in R. I would like to successively group by two different factor levels in order to obtain the sum of another variable.

Here is a reproducible example

df <- data.frame(c("A", "A", "A", "B", "C", "C","C"),
                 c("1", "1", "3", "2", "3", "2","2"),
                 c(12, 45, 78, 32, 5, 7, 8))

colnames(df) <- c("factor1","factor2","values")

And here is my try so far

test <- df %>%
  group_by(factor1, factor2) %>%
  summarise(sum(values))

# A tibble: 5 x 3
# Groups:   factor1 [3]
factor1 factor2 `sum(values)`
<fct>   <fct>           <dbl>
1 A       1                  57
2 A       3                  78
3 B       2                  32
4 C       2                  15
5 C       3                   5

But it's not what I am looking for. I would like to have one row per factor 1, with results looking like this (and the 0 accounted for as well)

        1   2   3 
A       57  0   78           
B       0   32  0             
C       0   15  5

any suggestions?

Upvotes: 2

Answers (4)

ThomasIsCoding

Reputation: 102890

I think @akrun's xtabs solution is the most concise solution so far. Here is another base R option, with aggregate + reshape

reshape(
  aggregate(values ~ ., df, sum),
  direction = "wide",
  idvar = "factor1",
  timevar = "factor2",
)

gives

  factor1 values.1 values.2 values.3
1       A       57       NA       78
2       B       NA       32       NA
3       C       NA       15        5

A data.table option

> dcast(setDT(df), factor1 ~ factor2, sum)
Using 'values' as value column. Use 'value.var' to override
   factor1  1  2  3
1:       A 57  0 78
2:       B  0 32  0
3:       C  0 15  5

Upvotes: 1

akrun

Reputation: 887961

We can use xtabs from base R

xtabs(values ~ factor1 + factor2 , df)
#       factor2
#factor1  1  2  3
#      A 57  0 78
#      B  0 32  0
#      C  0 15  5

Upvotes: 2

Ronak Shah

Reputation: 389325

Using pivot_Wider -

tidyr::pivot_wider(df, names_from = factor2, values_from = values, 
                    values_fn  =sum, values_fill = 0)

#  factor1   `1`   `3`   `2`
#  <chr>   <dbl> <dbl> <dbl>
#1 A          57    78     0
#2 B           0     0    32
#3 C           0     5    15

Or in data.table -

library(data.table)
dcast(setDT(df),factor1~factor2, value.var = 'values', fun.aggregate = sum)

Upvotes: 5

r2evans

Reputation: 161110

You need to "reshape" or "pivot" the data. Since you're already using dplyr, then you can use tidyr::pivot_wider. (Alternatively, reshape2::dcast will work similarly, though frankly I believe pivot_wider is more feature-full.)

library(dplyr)
test <- df %>%
  group_by(factor1, factor2) %>%
  summarise(z = sum(values))
tidyr::pivot_wider(test, factor1, names_from = "factor2", values_from = "z",
                   values_fill = 0)
# # A tibble: 3 x 4
# # Groups:   factor1 [3]
#   factor1   `1`   `3`   `2`
#   <chr>   <dbl> <dbl> <dbl>
# 1 A          57    78     0
# 2 B           0     0    32
# 3 C           0     5    15

Upvotes: 4

Group by two factors with dplyr

Answers (4)

Related Questions