Reputation: 1459
As described in numerous questions on here, I should be able to take a data.frame, group it, sort by date, and then apply cumsum, to get the cumulative sum over time per grouping.
Instead, with dplyr
0.8.0, I'm getting cumulative sums that ignore the grouping.
Example code:
data.frame(
cat = sample(c("a", "b", "c"), size = 1000, replace = T),
date = sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 1000, replace=T)
) %>%
mutate(
x = 1
) %>%
arrange(date) %>%
group_by(cat) %>%
mutate(x = cumsum(x)) %>%
tail()
Now, I'd expect the last few rows to have x
equal to around 300-something, for each group.
Instead I get:
# A tibble: 6 x 3
# Groups: cat [2]
cat date x
<chr> <date> <dbl>
1 a 1999-12-31 995
2 a 1999-12-31 996
3 c 2000-01-01 997
4 a 2000-01-01 998
5 c 2000-01-01 999
6 a 2000-01-01 1000
What am I doing wrong?
Upvotes: 2
Views: 832
Reputation: 3519
I'm guessing this is a classic problem when you load plyr after dplyr, nothing to do with your version of dplyr. For example:
tmp1<- data.frame(cat = sample(c("a", "b", "c"), size = 1000, replace = T),
date = sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 1000, replace=T)) %>% mutate(x = 1)
see difference between
tmp1 %>%
arrange(date) %>%
group_by(cat) %>%
plyr::mutate(x = cumsum(x)) %>%
tail()
and
tmp1 %>%
arrange(date) %>%
group_by(cat) %>%
dplyr::mutate(x = cumsum(x)) %>%
tail()
plyr's mutate doesn't understand grouping.
You can verify if this is the problem using search()
Upvotes: 2