Reputation: 455
I've got a dataset that looks like this:
id date outcome
ny21 2021-03-01 0
ny21 2021-02-01 1
ny21 2021-01-01 0
ch67 2021-08-09 0
I can calculate the time difference between rows grouped by id
in the following way:
unlist(tapply(df$date, INDEX = df$id, FUN = function(x) c(0, `units<-`(diff(x), "days"))))
However, I also want to calculate the time difference between rows grouped by id
, where outcome == 1
. If the current row has outcome == 1
, and no such previous row exists, I want to put 0. So, something like this:
id date outcome days_since_outcome_1
ny21 2021-03-01 0 28
ny21 2021-02-01 1 0
ny21 2021-01-01 0 0
ch67 2021-08-09 0 0
How can I do this?
EDIT:
In some situations, there are more than one outcome == 1
per id
. In this case, I would like the code to calculate the difference since the latest match (not the first match). Something like this:
id date outcome days_since_outcome_1
ny21 2021-05-01 1 30 # this is days since 2021-04-01
ny21 2021-04-01 1 59 # this is days since 2021-02-01
ny21 2021-03-01 0 28 # this is days since 2021-02-01 (first match)
ny21 2021-02-01 1 0
ny21 2021-01-01 0 0
ch67 2021-08-09 0 0
Upvotes: 2
Views: 162
Reputation: 388982
Using match
get the corresponding date
where outcome = 1
for a id
. Change the values which are before the outcome
date to 0 along with those values which do not have any value with outcome = 1
.
library(dplyr)
df %>%
mutate(date = as.Date(date)) %>%
group_by(id) %>%
mutate(days_since_outcome_1 = as.integer(date - date[match(1, outcome)]),
days_since_outcome_1 = replace(days_since_outcome_1,
days_since_outcome_1 < 0 | is.na(days_since_outcome_1), 0)) %>%
ungroup
# id date outcome days_since_outcome_1
# <chr> <date> <int> <dbl>
#1 ny21 2021-03-01 0 28
#2 ny21 2021-02-01 1 0
#3 ny21 2021-01-01 0 0
#4 ch67 2021-08-09 0 0
For larger datasets, you can try data.table
-
setDT(df)
df
df[, days_since_outcome_1 := as.integer(date - date[match(1, outcome)]), id]
df[, days_since_outcome_1 := replace(days_since_outcome_1,
days_since_outcome_1 < 0 | is.na(days_since_outcome_1), 0)]
df
Upvotes: 1