Reputation: 489
I am working on a dataset in which I need to calculate how long does it take for a retail store to replenish some products from shortage, and here is a quick view of the dataset in the simplest form:
Date <- c("2019-1-1","2019-1-2","2019-1-3","2019-1-4","2019-1-5","2019-1-6","2019-1-7","2019-1-8")
Product <- rep("Product A",8)
Net_Available_Qty <- c(-2,-2,10,8,-5,-6,-7,0)
sample_df <- data.frame(Date,Product,Net_Available_Qty)
When the Net_Available_Qty becomes negative, it means there is a shortage. When it turns back to 0 or positive qty, it means the supply has been recovered. What I need to calculate is the days between when we first see shortage and when it is recovered. In this case, for the 1st shortage, it took 2 days to recover and for the second shortage, it took 3 days to recover.
A tidyverse solution would be most welcome.
Upvotes: 1
Views: 113
Reputation: 23757
@Schilker had a great idea using rle
. I am building on his answer and offering a slightly shorter version including the use of cumsum
Date <- c("2019-1-1","2019-1-2","2019-1-3","2019-1-4","2019-1-5","2019-1-6","2019-1-7","2019-1-8")
Product <- rep("Product A",8)
Net_Available_Qty <- c(-2,-2,10,8,-5,-6,-7,0)
sample_df <- data.frame(Date,Product,Net_Available_Qty)
library(tidyverse)
sample_df %>%
mutate(
diffDate = c(1, diff(as.Date(Date))),
sequence = sequence(rle(Net_Available_Qty >= 0)$lengths),
group = cumsum(c(TRUE, diff(sequence)) != 1L)
) %>%
group_by(group) %>%
mutate(n_days = max(cumsum(diffDate)))
#> # A tibble: 8 x 7
#> # Groups: group [4]
#> Date Product Net_Available_Qty diffDate sequence group n_days
#> <fct> <fct> <dbl> <dbl> <int> <int> <dbl>
#> 1 2019-1-1 Product A -2 1 1 0 2
#> 2 2019-1-2 Product A -2 1 2 0 2
#> 3 2019-1-3 Product A 10 1 1 1 2
#> 4 2019-1-4 Product A 8 1 2 1 2
#> 5 2019-1-5 Product A -5 1 1 2 3
#> 6 2019-1-6 Product A -6 1 2 2 3
#> 7 2019-1-7 Product A -7 1 3 2 3
#> 8 2019-1-8 Product A 0 1 1 3 1
Created on 2020-02-23 by the reprex package (v0.3.0)
Upvotes: 0
Reputation: 505
I hope someone else finds a cleaner solution. But this produces diffDate
which assigns the date difference from when a negative turns positive/zero.
sample_df %>%
mutate(sign = ifelse(Net_Available_Qty > 0, "pos", ifelse(Net_Available_Qty < 0, "neg", "zero")),
sign_lag = lag(sign, default = sign[1]), # get previous value (exception in the first place)
change = ifelse(sign != sign_lag, 1 , 0), # check if there's a change
sequence=sequence(rle(as.character(sign))$lengths)) %>%
group_by(sequence) %>%
mutate(diffDate = as.numeric(difftime(Date, lag(Date,1))),
diffDate=ifelse(Net_Available_Qty <0, NA, ifelse((sign=='pos'| sign=='zero') & sequence==1, diffDate, NA))) %>%
ungroup() %>%
select(Date, Product, Net_Available_Qty, diffDate)
Upvotes: 2