Reputation:
I have a dataset that list multiple records per ID
that have duration (start & end ) associated with them.
It is in the form:
ID start end treatment
1: 1 2009-12-01 2009-12-14 A
2: 1 2009-12-15 2009-12-31 A
3: 2 2009-12-01 2009-12-15 B
4: 2 2009-12-15 <NA> B
5: 3 2009-12-01 2009-12-15 A
6: 3 2009-12-10 2009-12-31 B
Or:
library(data.table)
data <- data.table(ID = c(1, 1, 2, 2, 3, 3),
start = anytime::anydate(c("2009-12-01", "2009-12-15",
"2009-12-01", "2009-12-15", "2009-12-01", "2009-12-10")),
end = anytime::anydate(c("2009-12-14", "2009-12-31", "2009-12-15", NA, "2009-12-15", "2009-12-31")),
treatment = c("A", "A", "B", "B", "A","B"))
I'm struggling to coerce it to the form where multiple records with the same ID
and same or next day start & end dates are collapsed and appropriate dates are retained. In case of example end product would look like:
ID start end treatment
1: 1 2009-12-01 2009-12-31 A
2: 2 2009-12-01 <NA> B
3: 3 2009-12-01 2009-12-15 A
4: 3 2009-12-10 2009-12-31 B
I tried solving that with some intermediate columns, groupuing and retaining unique rows but somehow have a feeling there is a better solution here
setorder(data, ID, start, end, na.last=FALSE)
data[, start2 := min(start), by = list(ID, treatment)]
data[, end2 := max(end), by = list(ID, treatment)]
unique(data, by = c("ID", "start2", "end2", "treatment"))
Any suggestions?
Upvotes: 1
Views: 337
Reputation: 4480
How about this?
data[, .(start = min(start),
end = max(end)), by=.(ID,treatment)][,.(ID,start,end,treatment)]
# ID start end treatment
# 1: 1 2009-12-01 2009-12-31 A
# 2: 2 2009-12-01 <NA> B
# 3: 3 2009-12-01 2009-12-15 A
# 4: 3 2009-12-10 2009-12-31 B
Upvotes: 1
Reputation: 389345
Group by rleid
of treatment
values get min
and max
of start
and end
columns.
library(data.table)
data[, .(start = min(start),
end = max(end),
treatment = first(treatment),
ID = first(ID)), rleid(treatment)]
# rleid start end treatment ID
#1: 1 2009-12-01 2009-12-31 A 1
#2: 2 2009-12-01 <NA> B 2
#3: 3 2009-12-01 2009-12-15 A 3
#4: 4 2009-12-10 2009-12-31 B 3
Instead of min
and max
you can also use first
and last
if your data is ordered.
Upvotes: 2