Reputation: 3045
I want to calculate cumulative sum of values over all dates prior to and including current date. The problem is that i have multiple entries for the same date, so if I use cumsum I get different values for the values that happened on the same date:
library(dplyr)
tribble(~date, ~value,
"2017-01-01", 1,
"2017-01-02", 2,
"2017-01-02", 3,
"2017-01-03", 4,
"2017-01-03", 5,
"2017-01-04", 6,
"2017-01-09", 9) %>%
arrange(date) %>%
mutate(to_date=cumsum(value))
>
# A tibble: 7 x 3
date value to_date
<chr> <dbl> <dbl>
1 2017-01-01 1 1
2 2017-01-02 2 3
3 2017-01-02 3 6
4 2017-01-03 4 10
5 2017-01-03 5 15
6 2017-01-04 6 21
7 2017-01-09 9 30
Is there an elegant way of getting to the following output:
# A tibble: 7 x 3
date value to_date
<chr> <dbl> <dbl>
1 2017-01-01 1 1
2 2017-01-02 2 6
3 2017-01-02 3 6
4 2017-01-03 4 15
5 2017-01-03 5 15
6 2017-01-04 6 21
7 2017-01-09 9 30
For various reasons (among other things that I have many more fields in my table) I can not afford to summarize by data prior to running cumulative total. I (likely) need a growing window function that can calculate totals for time intervals.
Upvotes: 2
Views: 2088
Reputation: 3045
Alternatively, one could roll up the values by date on the side, calculate cumsum
and join in the results into the original data at the end.
library(dplyr)
df<-tribble(~date, ~value,
"2017-01-01", 1,
"2017-01-02", 2,
"2017-01-02", 3,
"2017-01-03", 4,
"2017-01-03", 5,
"2017-01-04", 6,
"2017-01-09", 9)
df %>% group_by(date) %>%
summarize(to_date=sum(value)) %>%
arrange(date) %>%
mutate(to_date=cumsum(to_date)) %>%
right_join(df, by=c("date"))
Result is:
# A tibble: 7 x 3
date to_date value
<chr> <dbl> <dbl>
1 2017-01-01 1 1
2 2017-01-02 6 2
3 2017-01-02 6 3
4 2017-01-03 15 4
5 2017-01-03 15 5
6 2017-01-04 21 6
7 2017-01-09 30 9
Upvotes: 0
Reputation: 887108
We can group_by 'date' and then get the last
'to_date'
df1 %>%
group_by(date) %>%
mutate(to_date = last(to_date))
Upvotes: 3