chopin_is_the_best
chopin_is_the_best

Reputation: 2101

Filter dataframe by conditions applied to groups of observations

Hello I need to filter this dataframe using the following conditions.

Here the dataframe in question:

# A tibble: 14 x 4
   user_id date    order_type  plan 
     <dbl> <chr>   <chr>       <chr>
 1     123 2019-02 acquisition 3M   
 2     123 2019-05 repeats     3M   
 3     123 2019-08 repeats     3M   
 4     124 2019-02 acquisition 1M   
 5     124 2019-03 repeats     3M   
 6     124 2019-06 repeats     3M   
 7     125 2019-08 acquisition 1M   
 8     125 2019-09 repeats     1M   
 9     126 2019-07 acquisition 3M   
10     126 2019-10 repeats     1M   
11     126 2019-11 repeats     1M   
12     127 2019-05 acquisition 3M   
13     127 2019-08 repeats     3M   
14     127 2019-11 repeats     3M   

Reproducible example:

df <- tibble::tribble(
        ~user_id,     ~date,   ~order_type, ~plan,
             123, "2019-02", "acquisition",  "3M",
             123, "2019-05",     "repeats",  "3M",
             123, "2019-08",     "repeats",  "3M",
             124, "2019-02", "acquisition",  "1M",
             124, "2019-03",     "repeats",  "3M",
             124, "2019-06",     "repeats",  "3M",
             125, "2019-08", "acquisition",  "1M",
             125, "2019-09",     "repeats",  "1M",
             126, "2019-07", "acquisition",  "3M",
             126, "2019-10",     "repeats",  "1M",
             126, "2019-11",     "repeats",  "1M",
             127, "2019-05", "acquisition",  "3M",
             127, "2019-08",     "repeats",  "3M",
             127, "2019-11",     "repeats",  "3M"
        )

I need to filter: * rows for which the filter user_id entry (noted as "acquisition") has a plan called "3M" * for those user_id all the subsequent orders identified as "3M"

Here the expected results:

# A tibble: 7 x 4
  user_id date    order_type  plan 
    <dbl> <chr>   <chr>       <chr>
1     123 2019-02 acquisition 3M   
2     123 2019-05 repeats     3M   
3     123 2019-08 repeats     3M   
4     126 2019-07 acquisition 3M   
5     127 2019-05 acquisition 3M   
6     127 2019-08 repeats     3M   
7     127 2019-11 repeats     3M   

Reproducible example:

df_filtered <- tibble::tribble(
                 ~user_id,     ~date,   ~order_type, ~plan,
                      123, "2019-02", "acquisition",  "3M",
                      123, "2019-05",     "repeats",  "3M",
                      123, "2019-08",     "repeats",  "3M",
                      126, "2019-07", "acquisition",  "3M",
                      127, "2019-05", "acquisition",  "3M",
                      127, "2019-08",     "repeats",  "3M",
                      127, "2019-11",     "repeats",  "3M"
                 )

Upvotes: 1

Views: 39

Answers (1)

Matt
Matt

Reputation: 7385

Here's a dplyr solution, but not sure it will capture this at a larger scale:

df %>%
  group_by(user_id) %>% 
  mutate(keep = case_when(any(plan == "3M" & order_type == "acquisition")~"Y", TRUE ~ "N")) %>%
  filter(keep == "Y" & plan != "1M")

Upvotes: 1

Related Questions