Reputation: 27
An example of my dataset:
library('lubridate')
id<-c(1,1,2,2,2,3,3,3)
admidate<-c(dmy('01/01/2010', '01/01/2011', '01/02/2010', '01/02/2011', '01/01/2012', '01/01/2010', '01/03/2010', '01/03/2011'))
disdate<-c(dmy('15/01/2010', '15/01/2011', '15/02/2010', '15/02/2011', '15/02/2012', '15/01/2010', '15/03/2010', '15/03/2011'))
first_adm<-c(1,0,1,0,0,1,0,0)
last_adm<-c(0,1,0,0,1,0,0,1)
intervention<-c(1,0,0,1,0,0,0,0)
df<-data.frame(id, admidate, disdate, first_adm, last_adm, intervention)
For each id
, I want to get a column of dates where:
intervention == 1
, time difference between admidate
for this row and last_adm==1
is returned. intervention == 0
, time difference between admidate
when first_adm==1
for that id
and last_adm==1
for that id
is returned. Each id
has only one first_adm
and only one last_adm
.
The purpose is to get follow times to compare between the two groups. For one it starts when they had an 'intervention'. For others who did not have intervention, it starts at first contact.
The df
finally would look something like this:
id admidate disdate first_adm last_adm intervention fu_time 1 2010-01-01 2010-01-15 1 0 1 - 1 2011-01-01 2011-01-15 0 1 0 365 2 2010-02-01 2010-02-15 1 0 0 - 2 2011-01-01 2011-02-15 0 0 1 - 2 2012-01-01 2012-02-15 0 1 0 365 3 2010-01-01 2010-01-15 1 0 0 - 3 2010-03-01 2010-03-15 0 0 0 - 3 2011-03-01 2011-03-15 0 1 0 431
Upvotes: 1
Views: 89
Reputation: 42544
This is what I would try with my preferred tool.
data.table
versionlibrary(data.table)
setDT(df)[, {
fa <- first(admidate[first_adm == 1])
la <- last(admidate[last_adm == 1])
fi <- first(admidate[intervention == 1])
ai <- any(intervention == 1)
.(fu_time = la - if (ai) pmax(fa, fi) else fa)
}, by = id]
which simply returns the result of each id
id fu_time 1: 1 365 days 2: 2 334 days 3: 3 424 days
Note that we still use first()
and last()
here as we are aggregating by id
although the OP has assured that each id
has only one first_adm
and only one last_adm
.
However the OP wants to have the original df
updated. With data.table
, we can append a new column where each row contains the result for the respective group:
setDT(df)[, fu_time := {
fa <- first(admidate[first_adm == 1])
la <- last(admidate[last_adm == 1])
fi <- first(admidate[intervention == 1])
ai <- any(intervention == 1)
la - if (ai) pmax(fa, fi) else fa
}, by = id]
Now, the original df
has been updated:
df[]
id admidate disdate first_adm last_adm intervention fu_time 1: 1 2010-01-01 2010-01-15 1 0 1 365 days 2: 1 2011-01-01 2011-01-15 0 1 0 365 days 3: 2 2010-02-01 2010-02-15 1 0 0 334 days 4: 2 2011-02-01 2011-02-15 0 0 1 334 days 5: 2 2012-01-01 2012-02-15 0 1 0 334 days 6: 3 2010-01-01 2010-01-15 1 0 0 424 days 7: 3 2010-03-01 2010-03-15 0 0 0 424 days 8: 3 2011-03-01 2011-03-15 0 1 0 424 days
If only the last row is to contain the result for each id
and if we want to fully simulate OP's expected result:
setDT(df)[, fu_time := {
fa <- first(admidate[first_adm == 1])
la <- last(admidate[last_adm == 1])
fi <- first(admidate[intervention == 1])
ai <- any(intervention == 1)
replace(rep("-", .N), .N, la - if (ai) pmax(fa, fi) else fa)
}, by = id]
df[]
id admidate disdate first_adm last_adm intervention fu_time 1: 1 2010-01-01 2010-01-15 1 0 1 - 2: 1 2011-01-01 2011-01-15 0 1 0 365 3: 2 2010-02-01 2010-02-15 1 0 0 - 4: 2 2011-02-01 2011-02-15 0 0 1 - 5: 2 2012-01-01 2012-02-15 0 1 0 334 6: 3 2010-01-01 2010-01-15 1 0 0 - 7: 3 2010-03-01 2010-03-15 0 0 0 - 8: 3 2011-03-01 2011-03-15 0 1 0 424
dplyr
versionAs the OP has tagged the question with mutate I suppose the OP might prefer a dplyr
approach:
library(dplyr)
df %>%
group_by(id) %>%
summarise(
fu_time = last(admidate[last_adm == 1]) -
if (any(intervention == 1))
pmax(first(admidate[first_adm == 1]), first(admidate[intervention == 1]))
else first(admidate[first_adm == 1])
) %>%
inner_join(df, .)
id admidate disdate first_adm last_adm intervention fu_time 1 1 2010-01-01 2010-01-15 1 0 1 365 days 2 1 2011-01-01 2011-01-15 0 1 0 365 days 3 2 2010-02-01 2010-02-15 1 0 0 334 days 4 2 2011-02-01 2011-02-15 0 0 1 334 days 5 2 2012-01-01 2012-02-15 0 1 0 334 days 6 3 2010-01-01 2010-01-15 1 0 0 424 days 7 3 2010-03-01 2010-03-15 0 0 0 424 days 8 3 2011-03-01 2011-03-15 0 1 0 424 days
Note that this will not update df
but creates a new object.
It is implicitely assumed that the dataset is already ordered by ascending admidate
within each id
group.
Upvotes: 2