Reputation: 1043
Suppose Tom placed two orders on Monday and Friday. But I want to figure out an efficient way to insert no-purchase data for Tue,Wed,Thu, which are not existed in my data so I can calculate cumulative total spending for every day for Tom.
My current code is to create an massive user-date(full date,from 2010-2011); merge them with existed data by full merge; fill in missing value; calculate cumsum.
user<-c("Tom","Tom","Jim","Jim")
order_time<-c("2018-01-01", "2018-01-04", "2018-01-02","2018-01-04")
total_spending<-c(20,80,50,60)
dt<-data.frame(user,order_time,total_spending)
> dt
user order_time total_spending
1 Tom 2018-01-01 20
2 Tom 2018-01-04 80
3 Jim 2018-01-02 50
4 Jim 2018-01-04 60
Desired output
user order_time total_spending cumulative_spending
1 Tom 2018-01-01 20 20
2 Tom 2018-01-02 0 20
3 Tom 2018-01-03 0 20
4 Tom 2018-01-04 80 100
5 Jim 2018-01-02 50 50
6 Jim 2018-01-03 0 50
7 Jim 2018-01-04 60 110
Upvotes: 2
Views: 48
Reputation: 21274
You can use complete
with seq.Date
:
dt %>%
mutate(order_time = as.Date(order_time)) %>%
group_by(user) %>%
complete(order_time =seq.Date(min(order_time), max(order_time), by="day")) %>%
replace_na(list(total_spending = 0)) %>%
mutate(cumulative_spending = cumsum(total_spending))
Output:
# A tibble: 7 x 4
# Groups: user [2]
user order_time total_spending cumulative_spending
<fct> <date> <dbl> <dbl>
1 Jim 2018-01-02 50. 50.
2 Jim 2018-01-03 0. 50.
3 Jim 2018-01-04 60. 110.
4 Tom 2018-01-01 20. 20.
5 Tom 2018-01-02 0. 20.
6 Tom 2018-01-03 0. 20.
7 Tom 2018-01-04 80. 100.
Upvotes: 2