ycc
ycc

Reputation: 107

How to do cumsum of 2 groups by dplyr?

I wrote the following code:

set.seed(20)
dat <- data.frame(item=c(rep("i1", 10), rep("i2", 10)),
                  choice=c(sample(1:4, 20, replace = TRUE)))

I need to get the cumulative sums of each choice by the 2 item:

  item  choice     n     cumsum  
1 i1         1     2          2
2 i1         2     3          5
3 i1         3     1          6
4 i1         4     4         10
5 i2         1     3          3
6 i2         2     3          6
7 i2         3     2          8
8 i2         4     2         10

I wrote:

dat %>% 
  group_by(item, choice) %>% 
  count() %>% 
  mutate(n) %>%
  mutate(cum=cumsum(n))

And get:

  item  choice     n   cum
  <fct>  <int> <int> <int>
1 i1         1     2     2
2 i1         2     3     3
3 i1         3     1     1
4 i1         4     4     4
5 i2         1     3     3
6 i2         2     3     3
7 i2         3     2     2
8 i2         4     2     2

How should I modify my code to get what I need?

Upvotes: 0

Views: 2050

Answers (2)

clemens
clemens

Reputation: 6813

You have grouped your data by item and choice. In order to get the cumulative sum per item, just use group it by item after you have summarised the data:

dat %>% 
  group_by(item, choice) %>% 
  count() %>% 
  group_by(item) %>% 
  mutate(cum=cumsum(n))

This will return:

# A tibble: 8 x 4
# Groups:   item [2]
  item  choice     n   cum
  <fct>  <int> <int> <int>
1 i1         1     2     2
2 i1         2     3     5
3 i1         3     1     6
4 i1         4     4    10
5 i2         1     3     3
6 i2         2     3     6
7 i2         3     2     8
8 i2         4     2    10

Upvotes: 1

MrFlick
MrFlick

Reputation: 206197

Looks like you just need

dat %>% 
  group_by(item, choice) %>% 
  summarize(n=n()) %>% 
  mutate(cum = cumsum(n))

Upvotes: 2

Related Questions