Felix Zhao
Felix Zhao

Reputation: 489

dplyr: calculate the days for product replenishment

I am working on a dataset in which I need to calculate how long does it take for a retail store to replenish some products from shortage, and here is a quick view of the dataset in the simplest form:

enter image description here

Date <- c("2019-1-1","2019-1-2","2019-1-3","2019-1-4","2019-1-5","2019-1-6","2019-1-7","2019-1-8")
Product <- rep("Product A",8)
Net_Available_Qty <- c(-2,-2,10,8,-5,-6,-7,0)
sample_df <- data.frame(Date,Product,Net_Available_Qty)

When the Net_Available_Qty becomes negative, it means there is a shortage. When it turns back to 0 or positive qty, it means the supply has been recovered. What I need to calculate is the days between when we first see shortage and when it is recovered. In this case, for the 1st shortage, it took 2 days to recover and for the second shortage, it took 3 days to recover.

A tidyverse solution would be most welcome.

Upvotes: 1

Views: 113

Answers (2)

tjebo
tjebo

Reputation: 23757

@Schilker had a great idea using rle. I am building on his answer and offering a slightly shorter version including the use of cumsum

Date <- c("2019-1-1","2019-1-2","2019-1-3","2019-1-4","2019-1-5","2019-1-6","2019-1-7","2019-1-8")
Product <- rep("Product A",8)
Net_Available_Qty <- c(-2,-2,10,8,-5,-6,-7,0)
sample_df <- data.frame(Date,Product,Net_Available_Qty)

library(tidyverse)

sample_df %>%
  mutate(
    diffDate = c(1, diff(as.Date(Date))),
    sequence = sequence(rle(Net_Available_Qty >= 0)$lengths),
    group = cumsum(c(TRUE, diff(sequence)) != 1L)
  ) %>%
  group_by(group) %>%
  mutate(n_days = max(cumsum(diffDate)))
#> # A tibble: 8 x 7
#> # Groups:   group [4]
#>   Date     Product   Net_Available_Qty diffDate sequence group n_days
#>   <fct>    <fct>                 <dbl>    <dbl>    <int> <int>  <dbl>
#> 1 2019-1-1 Product A                -2        1        1     0      2
#> 2 2019-1-2 Product A                -2        1        2     0      2
#> 3 2019-1-3 Product A                10        1        1     1      2
#> 4 2019-1-4 Product A                 8        1        2     1      2
#> 5 2019-1-5 Product A                -5        1        1     2      3
#> 6 2019-1-6 Product A                -6        1        2     2      3
#> 7 2019-1-7 Product A                -7        1        3     2      3
#> 8 2019-1-8 Product A                 0        1        1     3      1

Created on 2020-02-23 by the reprex package (v0.3.0)

Upvotes: 0

Schilker
Schilker

Reputation: 505

I hope someone else finds a cleaner solution. But this produces diffDate which assigns the date difference from when a negative turns positive/zero.

sample_df %>%
  mutate(sign = ifelse(Net_Available_Qty > 0, "pos", ifelse(Net_Available_Qty < 0, "neg", "zero")),
         sign_lag = lag(sign, default = sign[1]),       # get previous value (exception in the first place)
         change = ifelse(sign != sign_lag, 1 , 0),      # check if there's a change
         sequence=sequence(rle(as.character(sign))$lengths)) %>%
  group_by(sequence) %>%
  mutate(diffDate = as.numeric(difftime(Date, lag(Date,1))),
         diffDate=ifelse(Net_Available_Qty <0, NA, ifelse((sign=='pos'| sign=='zero') & sequence==1, diffDate, NA))) %>% 
  ungroup() %>%
  select(Date, Product, Net_Available_Qty, diffDate)

Upvotes: 2

Related Questions