Bob
Bob

Reputation: 1459

dplyr not respecting group_by when applying cumsum

As described in numerous questions on here, I should be able to take a data.frame, group it, sort by date, and then apply cumsum, to get the cumulative sum over time per grouping.

Instead, with dplyr 0.8.0, I'm getting cumulative sums that ignore the grouping.

Example code:

data.frame(
  cat = sample(c("a", "b", "c"), size = 1000, replace = T),
  date = sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 1000, replace=T)
) %>%
  mutate(
    x = 1
  ) %>% 
  arrange(date) %>%
  group_by(cat) %>%
  mutate(x = cumsum(x)) %>%
  tail()

Now, I'd expect the last few rows to have x equal to around 300-something, for each group.

Instead I get:

# A tibble: 6 x 3
# Groups:   cat [2]
  cat   date           x
  <chr> <date>     <dbl>
1 a     1999-12-31   995
2 a     1999-12-31   996
3 c     2000-01-01   997
4 a     2000-01-01   998
5 c     2000-01-01   999
6 a     2000-01-01  1000

What am I doing wrong?

Upvotes: 2

Views: 832

Answers (1)

Sarah
Sarah

Reputation: 3519

I'm guessing this is a classic problem when you load plyr after dplyr, nothing to do with your version of dplyr. For example:

tmp1<- data.frame(cat = sample(c("a", "b", "c"), size = 1000, replace = T),
date = sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 1000, replace=T)) %>%    mutate(x = 1)

see difference between

tmp1 %>% 
arrange(date) %>%
group_by(cat) %>%
plyr::mutate(x = cumsum(x)) %>%
tail()

and

tmp1 %>% 
  arrange(date) %>%
  group_by(cat) %>%
  dplyr::mutate(x = cumsum(x)) %>%
  tail()

plyr's mutate doesn't understand grouping.

You can verify if this is the problem using search()

Upvotes: 2

Related Questions