Reputation: 203
I have a data frame which contains date column, customer ID, and 9 different metrics for each unique combination of date and customer ID. I have one week's worth of data: '2020-04-05' to '2020-04-12'. I need to calcuate the growth rate in each variable on Sunday for every customer, i.e., metric values for each customer on '2020-04-12' divided by metric values for each customer on '2020-04-05' and use that %percent growth rate to compute metric values on next Monday, Tuesday, and Wednesday as values of metric1..metric9 on next_monday = metric1{previous_monday]*growth rate for each customer. This needs to happen for next tuesday and wednesday as well based on previous week's tuesday and wednesday.Sample data frame as:
p <- seq(0,100, by =1)
df <- data.frame(Customer_ID = p)
df_1 <- df %>%
group_by(Customer_ID) %>%
do( data.frame(., Date= seq(as.Date('2020-04-05'),as.Date('2020-04-12'), by = '1 day')))
df_1$metric_1 <- seq(200,300,by=1)
df_1$metric_2 <- seq(400,500,by=1)
df_1$metric_3 <- seq(600,700,by=1)
As a first step to compute the value the growth rate I tried this:
df1_change <- df_1 %>% dplyr::group_by(Customer_ID) %>% dplyr::mutate(metric1_change = ifelse(day_of_week == 'Sunday' & Date == '2020-04-12',
df_1$metric_1[df_1$Date == '2020-04-12']/df_1$metric_1[df_1$Date == '2020-04-05'],"NA"))
But I did not get the right growth percent. I am sure this is more elegant way of doing all the steps.
Adding few rows of sample data frame and few rows of expected output(shown as computation values):
Customer_ID Date metric_1 metric_2 metric_3 day_of_week
0 2020-04-05 200 400 600 Sunday
0 2020-04-06 201 401 601 Monday
0 2020-04-07 202 402 602 Tuesday
0 2020-04-08 203 403 603 Wednesday
0 2020-04-09 204 404 604 Thursday
0 2020-04-10 205 405 605 Friday
0 2020-04-11 206 406 606 Saturday
0 2020-04-12 207 407 607 Sunday
0 2020-04-13 (207/200)*201 (407/400)*401 (607/600)*601 Monday
0 2020-04-15 (207/200)*202 (407/400)*402 (607/600)*602 Tuesday
0 2020-04-16 (207/200)*203 (407/400)*403 (607/600)*603 Wednesday
Upvotes: 0
Views: 34
Reputation: 2064
You have a problem with the customer_ID and Date logic. 1 customer only has 1 date as a result. But something like this should work for your scenario.
df_2 <- df_1 %>%
pivot_longer(names_to = "Metrics", values_to = "value", cols = metric_1:metric_3) %>%
group_by(Customer_ID, Metrics) %>%
arrange(Date) %>%
mutate(value_prev = lag(value, 1),
improv = value / value_prev) %>%
pivot_wider(names_from = Date, values_from = value) %>%
mutate(new_period1 = Date * improv,
new_period2 = new_period1 * improv)
Upvotes: 0