victor8910
victor8910

Reputation: 203

how to compute values of multiple columns of a data frame by dividing values on later date by earlier date in a data frame in R grouped by customer id

I have a data frame which contains date column, customer ID, and 9 different metrics for each unique combination of date and customer ID. I have one week's worth of data: '2020-04-05' to '2020-04-12'. I need to calcuate the growth rate in each variable on Sunday for every customer, i.e., metric values for each customer on '2020-04-12' divided by metric values for each customer on '2020-04-05' and use that %percent growth rate to compute metric values on next Monday, Tuesday, and Wednesday as values of metric1..metric9 on next_monday = metric1{previous_monday]*growth rate for each customer. This needs to happen for next tuesday and wednesday as well based on previous week's tuesday and wednesday.Sample data frame as:

p <- seq(0,100, by =1)
df <- data.frame(Customer_ID = p)
df_1 <- df %>% 
  group_by(Customer_ID) %>% 
  do( data.frame(., Date= seq(as.Date('2020-04-05'),as.Date('2020-04-12'), by = '1 day')))                 
df_1$metric_1 <- seq(200,300,by=1)
df_1$metric_2 <- seq(400,500,by=1)
df_1$metric_3 <- seq(600,700,by=1)

As a first step to compute the value the growth rate I tried this:

df1_change <- df_1 %>% dplyr::group_by(Customer_ID) %>% dplyr::mutate(metric1_change = ifelse(day_of_week == 'Sunday' & Date == '2020-04-12',
                                                                                                        df_1$metric_1[df_1$Date == '2020-04-12']/df_1$metric_1[df_1$Date == '2020-04-05'],"NA"))

But I did not get the right growth percent. I am sure this is more elegant way of doing all the steps.

Adding few rows of sample data frame and few rows of expected output(shown as computation values):

Customer_ID  Date         metric_1        metric_2    metric_3       day_of_week
0          2020-04-05      200             400        600              Sunday
0          2020-04-06      201             401        601              Monday
0          2020-04-07      202             402        602              Tuesday
0          2020-04-08      203             403        603             Wednesday
0          2020-04-09      204             404        604             Thursday
0          2020-04-10      205             405        605              Friday
0          2020-04-11      206             406        606             Saturday
0          2020-04-12      207             407        607             Sunday
0          2020-04-13 (207/200)*201  (407/400)*401   (607/600)*601      Monday
0          2020-04-15 (207/200)*202  (407/400)*402   (607/600)*602     Tuesday
0          2020-04-16 (207/200)*203  (407/400)*403   (607/600)*603     Wednesday

Upvotes: 0

Views: 34

Answers (1)

woshishui
woshishui

Reputation: 2064

You have a problem with the customer_ID and Date logic. 1 customer only has 1 date as a result. But something like this should work for your scenario.

df_2 <- df_1 %>% 
  pivot_longer(names_to = "Metrics", values_to = "value", cols = metric_1:metric_3) %>% 
  group_by(Customer_ID, Metrics) %>% 
  arrange(Date) %>% 
  mutate(value_prev = lag(value, 1),
         improv = value / value_prev) %>%
  pivot_wider(names_from = Date, values_from = value) %>% 
  mutate(new_period1 = Date * improv,
         new_period2 = new_period1 * improv)

Upvotes: 0

Related Questions