Reputation: 497
I'm trying to create a calculated column using dplyr to get the days difference between the reference date(current) and a future date on a rolling basis. For e.g, I have a data frame like-
sample = data.frame(dates = seq(today(), today() + weeks(3), by = 1), qty =
floor(100 * rnorm(22)))
What I want to achieve is create a new column, say days_to which will be 0 if the qty >=0. However if qty < 0, then days_to should be the number of days till the qty goes above 0. If the qty doesn't go above 0 for any future date, then days_to = NA/Inf (not important). So for the above example it should look something like -
dates qty days_to
10/17/2018 175 0
10/18/2018 -69 2
10/19/2018 -20 1
10/20/2018 113 0
10/21/2018 7 0
10/22/2018 120 0
10/23/2018 48 0
10/24/2018 -31 NA
10/25/2018 -9 NA
10/26/2018 -87 NA
I need to do this for a large number of rows(~2M) on a grouped variable and hence trying to use dplyr to achieve this. Any help is appreciated.
Thanks!
Upvotes: 0
Views: 39
Reputation: 160687
dplyr
library(dplyr)
sampledplyr <- sample %>%
mutate(grp = cumsum(qty > 0 & lag(qty) < 0)) %>%
group_by(grp) %>%
mutate(days_to = if_else(qty < 0, n() - row_number() + 1L, 0L)) %>%
ungroup() %>%
select(-grp)
print(sampledplyr, n=22)
# # A tibble: 22 x 3
# dates qty days_to
# <date> <dbl> <int>
# 1 2018-10-17 -63 1
# 2 2018-10-18 18 0
# 3 2018-10-19 -84 1
# 4 2018-10-20 159 0
# 5 2018-10-21 32 0
# 6 2018-10-22 -83 1
# 7 2018-10-23 48 0
# 8 2018-10-24 73 0
# 9 2018-10-25 57 0
# 10 2018-10-26 -31 1
# 11 2018-10-27 151 0
# 12 2018-10-28 38 0
# 13 2018-10-29 -63 2
# 14 2018-10-30 -222 1
# 15 2018-10-31 112 0
# 16 2018-11-01 -5 2
# 17 2018-11-02 -2 1
# 18 2018-11-03 94 0
# 19 2018-11-04 82 0
# 20 2018-11-05 59 0
# 21 2018-11-06 91 0
# 22 2018-11-07 78 0
data.table
library(data.table)
sampledt <- as.data.table(sample)
sampledt[,days_to := ifelse(qty < 0, .N - seq_len(nrow(.SD)) + 1L, 0L),
by = cumsum(qty > 0 & lag(qty) < 0)]
(Same output.)
Data:
set.seed(1) # alway
sample = data.frame(dates = seq(Sys.Date(), Sys.Date() + 3*7, by = 1),
qty = floor(100 * rnorm(22)))
Upvotes: 2