dmi3kno
dmi3kno

Reputation: 3045

cumsum() up to and including current date in dplyr

I want to calculate cumulative sum of values over all dates prior to and including current date. The problem is that i have multiple entries for the same date, so if I use cumsum I get different values for the values that happened on the same date:

library(dplyr)
tribble(~date, ~value,
        "2017-01-01", 1,
        "2017-01-02", 2,
        "2017-01-02", 3,
        "2017-01-03", 4,
        "2017-01-03", 5,
        "2017-01-04", 6,
        "2017-01-09", 9) %>% 
  arrange(date) %>% 
  mutate(to_date=cumsum(value))
>
# A tibble: 7 x 3
        date value  to_date
       <chr> <dbl>    <dbl>
1 2017-01-01     1        1
2 2017-01-02     2        3
3 2017-01-02     3        6
4 2017-01-03     4       10
5 2017-01-03     5       15
6 2017-01-04     6       21
7 2017-01-09     9       30

Is there an elegant way of getting to the following output:

# A tibble: 7 x 3
        date value  to_date
       <chr> <dbl>    <dbl>
1 2017-01-01     1        1
2 2017-01-02     2        6
3 2017-01-02     3        6
4 2017-01-03     4       15
5 2017-01-03     5       15
6 2017-01-04     6       21
7 2017-01-09     9       30

For various reasons (among other things that I have many more fields in my table) I can not afford to summarize by data prior to running cumulative total. I (likely) need a growing window function that can calculate totals for time intervals.

Upvotes: 2

Views: 2088

Answers (2)

dmi3kno
dmi3kno

Reputation: 3045

Alternatively, one could roll up the values by date on the side, calculate cumsum and join in the results into the original data at the end.

library(dplyr)
df<-tribble(~date, ~value,
        "2017-01-01", 1,
        "2017-01-02", 2,
        "2017-01-02", 3,
        "2017-01-03", 4,
        "2017-01-03", 5,
        "2017-01-04", 6,
        "2017-01-09", 9) 

df %>% group_by(date) %>% 
  summarize(to_date=sum(value)) %>% 
  arrange(date) %>% 
  mutate(to_date=cumsum(to_date)) %>% 
  right_join(df, by=c("date"))

Result is:

# A tibble: 7 x 3
        date to_date value
       <chr>   <dbl> <dbl>
1 2017-01-01       1     1
2 2017-01-02       6     2
3 2017-01-02       6     3
4 2017-01-03      15     4
5 2017-01-03      15     5
6 2017-01-04      21     6
7 2017-01-09      30     9

Upvotes: 0

akrun
akrun

Reputation: 887108

We can group_by 'date' and then get the last 'to_date'

df1 %>%
    group_by(date) %>%
    mutate(to_date = last(to_date))

Upvotes: 3

Related Questions