chopin_is_the_best
chopin_is_the_best

Reputation: 2101

Conditionally rename elements within groups based on previous observations in tidyverse

My dataframe contains the following elements:

1) Each user_id may have several order_id 2) Each order_id can be of 2 types: monthly orders have a cycle of 1 month (30 days) or 3 months (90 days). The users - during his lifetime - can switch from one plan_type to the other multiple times.

library(tidyverse)
df_input <- tibble::tribble(
              ~user_id, ~order_id,        ~date, ~plan_type,
                     1,       123, "01-01-2020",  "monthly",
                     1,       124, "01-31-2020",  "monthly",
                     1,       125, "03-01-2020", "3-months",
                     2,       126, "01-11-2019", "3-months",
                     2,       127, "10-13-2018",  "monthly",
                     2,       128, "11-12-2018",  "monthly",
                     3,       129, "01-10-2019", "3-months",
                     3,       130, "04-10-2019", "3-months",
                     3,       131, "07-09-2019", "3-months",
                     4,       132, "01-02-2020",  "monthly",
                     4,       133, "02-01-2020",  "monthly"
              )

> df_input
# A tibble: 11 x 4
   user_id order_id date       plan_type
     <dbl>    <dbl> <chr>      <chr>    
 1       1      123 01-01-2020 monthly  
 2       1      124 01-31-2020 monthly  
 3       1      125 03-01-2020 3-months 
 4       2      126 01-11-2019 3-months 
 5       2      127 10-13-2018 monthly  
 6       2      128 11-12-2018 monthly  
 7       3      129 01-10-2019 3-months 
 8       3      130 04-10-2019 3-months 
 9       3      131 07-09-2019 3-months 
10       4      132 01-02-2020 monthly  
11       4      133 02-01-2020 monthly  

I want to create 2 extra columns as follows:

1) order_type: I want to rank the order_ids grouping by user_id and date; after ranking the order by date, I want to name them acquisition if rank = 1 (first order), repeat otherwise.

I managed to do it as follows:

df_input %>%
  group_by(user_id) %>%
  mutate(rank = row_number(date)) %>%
  mutate(order_type = ifelse(rank == '1','acquisition','repeats'))

2) I want to create a behavior_type column. I want to flag each order_id as upgrade, downgrade or none. If the user goes from monthly to 3-months the order is going to be upgrade, if from 3-months to monthly a downgrade, none otherwise.

My final dataframe will have to look like:

df_final <- tibble::tribble(
              ~order_id, ~behavior_type,   ~order_type,
                    123,         "none", "acquisition",
                    124,         "none",      "repeat",
                    125,      "upgrade",      "repeat",
                    126,         "none", "acquisition",
                    127,    "downgrade",      "repeat",
                    128,         "none",      "repeat",
                    129,         "none", "acquisition",
                    130,         "none",      "repeat",
                    131,         "none",      "repeat",
                    132,         "none", "acquisition",
                    133,         "none",      "repeat"
              )

> df_final
# A tibble: 11 x 3
   order_id behavior_type order_type 
      <dbl> <chr>         <chr>      
 1      123 none          acquisition
 2      124 none          repeat     
 3      125 upgrade       repeat     
 4      126 none          acquisition
 5      127 downgrade     repeat     
 6      128 none          repeat     
 7      129 none          acquisition
 8      130 none          repeat     
 9      131 none          repeat     
10      132 none          acquisition
11      133 none          repeat     

Any help for this second step

Upvotes: 1

Views: 38

Answers (3)

Hamza Chennaq
Hamza Chennaq

Reputation: 66

library(dplyr)

df_input %>%
  group_by(user_id) %>%
  mutate(rank = row_number(date)) %>%
  mutate(order_type = if_else(rank == "1","acquisition","repeats"),
         behavior_type = if_else(rank == "1", "none",
                                 if_else(plan_type != lag(plan_type) & plan_type == "3-months", "upgrade",
                                         if_else(plan_type != lag(plan_type) & plan_type == "monthly", "downgrade", 
                                                 "none"))))

# A tibble: 11 x 7
# Groups:   user_id [4]
   user_id order_id date       plan_type  rank order_type  behavior_type
     <dbl>    <dbl> <chr>      <chr>     <int> <chr>       <chr>        
 1       1      123 01-01-2020 monthly       1 acquisition none         
 2       1      124 01-31-2020 monthly       2 repeats     none         
 3       1      125 03-01-2020 3-months      3 repeats     upgrade      
 4       2      126 01-11-2019 3-months      1 acquisition none         
 5       2      127 10-13-2018 monthly       2 repeats     downgrade    
 6       2      128 11-12-2018 monthly       3 repeats     none         
 7       3      129 01-10-2019 3-months      1 acquisition none         
 8       3      130 04-10-2019 3-months      2 repeats     none         
 9       3      131 07-09-2019 3-months      3 repeats     none         
10       4      132 01-02-2020 monthly       1 acquisition none         
11       4      133 02-01-2020 monthly       2 repeats     none  

Upvotes: 0

Ahorn
Ahorn

Reputation: 3876

Like this?

library(dplyr)
#just repeating your first step
df_interim <- df_input %>%
  group_by(user_id) %>%
  mutate(rank = row_number(date)) %>%
  mutate(order_type = ifelse(rank == '1','acquisition','repeats'))

df_interim %>% 
  mutate(behavior_type = case_when(plan_type == "monthly" & lag(plan_type) == "3-months" ~ "downgrade",
                                   plan_type == "3-months" & lag(plan_type) == "monthly" ~ "upgrade",
                                   TRUE ~ "none")) %>% 
  ungroup() %>% 
  select(order_id, behavior_type, order_type)

# A tibble: 11 x 3
   order_id behavior_type order_type 
      <dbl> <chr>         <chr>      
 1      123 none          acquisition
 2      124 none          repeats    
 3      125 upgrade       repeats    
 4      126 none          acquisition
 5      127 downgrade     repeats    
 6      128 none          repeats    
 7      129 none          acquisition
 8      130 none          repeats    
 9      131 none          repeats    
10      132 none          acquisition
11      133 none          repeats

Upvotes: 1

r.user.05apr
r.user.05apr

Reputation: 5456

The order-type part could be simpler like so:

  group_by(user_id) %>%
  mutate(order_type = if_else(as.Date(date, "%m-%d-%Y") == min(as.Date(date, "%m-%d-%Y")), "acquisition", "repeat"))

Upvotes: 0

Related Questions