ANam
ANam

Reputation: 347

How to find the minimum by group based on a condition using dplyr?

So I have a random data frame that can be created with this code

library(dplyr)
dates <- seq(as.Date("2015-01-01"),as.Date("2015-12-31"),1)
weekdays <- weekdays(dates)

res <- data.frame(dates,weekdays)

res$customer <- ifelse(dates > as.Date("2015-02-05"), "Google", "Apple")

res$order_flag <- ifelse(weekdays == "Wednesday", 1, 0)

I am trying to create a flag that equates to one for the first time that order_flag==1 for each customer. The end result here would only have two instances where this new flag = 1. I tried to do this in dplyr this way:

new_data <- res %>% group_by(customer) %>% mutate(min_date = which.min(order_flag ==1)) 

but that didn't seem to work.

Upvotes: 0

Views: 1005

Answers (2)

user12256545
user12256545

Reputation: 3022

use slice:

new_data <- res %>% group_by(customer) %>% 
  filter(order_flag == 1) %>% 
  slice(1) %>% mutate(first_order= 1) %>% right_join(res)


Upvotes: 0

IceCreamToucan
IceCreamToucan

Reputation: 28705

Make a vector of all 0s with integer(n()), then replace element which.max(order_flag == 1) with 1.

res %>% 
  group_by(customer) %>% 
  mutate(min_date = replace(integer(n()), which.max(order_flag == 1), 1))

#  # A tibble: 365 x 5
# # Groups:   customer [2]
#    dates      weekdays  customer order_flag min_date
#    <date>     <fct>     <chr>         <dbl>    <dbl>
#  1 2015-01-01 Thursday  Apple             0        0
#  2 2015-01-02 Friday    Apple             0        0
#  3 2015-01-03 Saturday  Apple             0        0
#  4 2015-01-04 Sunday    Apple             0        0
#  5 2015-01-05 Monday    Apple             0        0
#  6 2015-01-06 Tuesday   Apple             0        0
#  7 2015-01-07 Wednesday Apple             1        1
#  8 2015-01-08 Thursday  Apple             0        0
#  9 2015-01-09 Friday    Apple             0        0
# 10 2015-01-10 Saturday  Apple             0        0
# # ... with 355 more rows

If you have any groups without any order_flag equal to 1, which.max can have odd behavior

which.max(c(FALSE, FALSE, FALSE, FALSE))
# [1] 1

To prevent this, here's another option which doesn't have that issue.

res %>% 
  group_by(customer) %>% 
  mutate(min_date = order_flag == 1,
         min_date = as.integer(min_date & cumsum(min_date) == 1))

Upvotes: 2

Related Questions