Reputation: 177
I have a list of customers, revenue from them through two channels (online & offline) over the last few financial years. I would like to have a variable showing each customer's total revenue (online+offline) for the previous year.
Sample data is shown below with desired variablehighlighted in yellow. Calculation is shown in adjacent column.
I tried grouping by CustomerID & Fin Year, calculate sum of Revenue and use lag() function to get previous year's total revenue but it didn't work.
df %>% group_by(CustomerID, FinYear) %>% mutate(yearly_totalRevenue = sum(Revenue)) %>% mutate(lastyear_totalRevenue = lag(yearly_totalRevenue )) %>% ungroup()
Note: As the data volume is in the range of 10M, a memory efficient code (preferably using data.table functionalities) will be highly appreciated.
Thanks in advance.
Edit1: dput() of the sample data is added.
structure(list(CustomerID = c("Cust1", "Cust2", "Cust3", "Cust4",
"Cust5", "Cust1", "Cust2", "Cust3", "Cust4", "Cust5"), `Fin Year` =
c("2010/11",
"2011/12", "2012/13", "2013/14", "2014/15", "2010/11", "2011/12",
"2012/13", "2013/14", "2014/15"), Channel = c("Online", "Online",
"Online", "Online", "Online", "Offline", "Offline", "Offline",
"Offline", "Offline"), Revenue = c(858, 733, 248, 541, 222, 316,
412, 167, 385, 654)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
Upvotes: 0
Views: 47
Reputation: 25225
You can try:
setDT(df)[, yearly_totalRevenue := sum(Revenue), .(CustomerID, FinYear)][,
lastyear_totalRevenue := shift(yearly_totalRevenue), .(rowid(CustomerID))]
output:
CustomerID FinYear Channel Revenue yearly_totalRevenue lastyear_totalRevenue
1: Cust1 2010/11 Online 858 1174 NA
2: Cust2 2011/12 Online 733 1145 1174
3: Cust3 2012/13 Online 248 415 1145
4: Cust4 2013/14 Online 541 926 415
5: Cust5 2014/15 Online 222 876 926
6: Cust1 2010/11 Offline 316 1174 NA
7: Cust2 2011/12 Offline 412 1145 1174
8: Cust3 2012/13 Offline 167 415 1145
9: Cust4 2013/14 Offline 385 926 415
10: Cust5 2014/15 Offline 654 876 926
Upvotes: 1