Reputation: 463
I am struggling a bit with the dplyr structure in R. I would like to successively group by two different factor levels in order to obtain the sum of another variable.
Here is a reproducible example
df <- data.frame(c("A", "A", "A", "B", "C", "C","C"),
c("1", "1", "3", "2", "3", "2","2"),
c(12, 45, 78, 32, 5, 7, 8))
colnames(df) <- c("factor1","factor2","values")
And here is my try so far
test <- df %>%
group_by(factor1, factor2) %>%
summarise(sum(values))
# A tibble: 5 x 3
# Groups: factor1 [3]
factor1 factor2 `sum(values)`
<fct> <fct> <dbl>
1 A 1 57
2 A 3 78
3 B 2 32
4 C 2 15
5 C 3 5
But it's not what I am looking for. I would like to have one row per factor 1, with results looking like this (and the 0 accounted for as well)
1 2 3
A 57 0 78
B 0 32 0
C 0 15 5
any suggestions?
Upvotes: 2
Views: 3780
Reputation: 102890
I think @akrun's xtabs
solution is the most concise solution so far. Here is another base R option, with aggregate
+ reshape
reshape(
aggregate(values ~ ., df, sum),
direction = "wide",
idvar = "factor1",
timevar = "factor2",
)
gives
factor1 values.1 values.2 values.3
1 A 57 NA 78
2 B NA 32 NA
3 C NA 15 5
A data.table
option
> dcast(setDT(df), factor1 ~ factor2, sum)
Using 'values' as value column. Use 'value.var' to override
factor1 1 2 3
1: A 57 0 78
2: B 0 32 0
3: C 0 15 5
Upvotes: 1
Reputation: 887961
We can use xtabs
from base R
xtabs(values ~ factor1 + factor2 , df)
# factor2
#factor1 1 2 3
# A 57 0 78
# B 0 32 0
# C 0 15 5
Upvotes: 2
Reputation: 389325
Using pivot_Wider
-
tidyr::pivot_wider(df, names_from = factor2, values_from = values,
values_fn =sum, values_fill = 0)
# factor1 `1` `3` `2`
# <chr> <dbl> <dbl> <dbl>
#1 A 57 78 0
#2 B 0 0 32
#3 C 0 5 15
Or in data.table
-
library(data.table)
dcast(setDT(df),factor1~factor2, value.var = 'values', fun.aggregate = sum)
Upvotes: 5
Reputation: 161110
You need to "reshape" or "pivot" the data. Since you're already using dplyr
, then you can use tidyr::pivot_wider
. (Alternatively, reshape2::dcast
will work similarly, though frankly I believe pivot_wider
is more feature-full.)
library(dplyr)
test <- df %>%
group_by(factor1, factor2) %>%
summarise(z = sum(values))
tidyr::pivot_wider(test, factor1, names_from = "factor2", values_from = "z",
values_fill = 0)
# # A tibble: 3 x 4
# # Groups: factor1 [3]
# factor1 `1` `3` `2`
# <chr> <dbl> <dbl> <dbl>
# 1 A 57 78 0
# 2 B 0 0 32
# 3 C 0 5 15
Upvotes: 4