Reputation: 545
Problem: Suppose I have the below data.table
object (ex) which I want to merge with date_frame
object such that I get the desired output. The desired output shall be in the form such that the C_ states show only up once per CURRENT_DATE. However, if there is more than one E state per iid then these E states (with respective start date) shall show up multiple times per CURRENT_DATE. Any idea how to do that? Many thanks!!
## Desired Output
iid state start CURRENT_DATE
1 E 2020-01-02 2020-03-31
1 C_1 2020-01-03 2020-03-31
1 C_2 2020-01-05 2020-03-31
1 E 2020-01-06 2020-03-31
1 E 2020-01-08 2020-03-31
2 E 2020-01-10 2020-03-31
1 E 2020-01-02 2020-04-01
....
Toy example
library(data.table)
ex <- data.table(iid = c(1, 1, 1, 1, 1, 2),
state = c("E", "C_1", "C_2", "E", "E", "E"),
start = c("2020-01-02", "2020-01-03", "2020-01-05", "2020-01-06", "2020-01-08", "2020-01-10"))
s <- seq.Date(from = ex[, min(ymd(start))],
to = Sys.Date(),
"days")
date_frame <- CJ(CURRENT_DATE = s, iid = unique(ex$iid))
## not working
merge(ex, date_frame, all = TRUE, by = "iid")
Upvotes: 0
Views: 35
Reputation: 25225
Are you looking for something like:
ex[CJ(CURRENT_DATE=seq(max(start), min(start), by="-1 day"), iid, unique=TRUE),
on=.(iid), allow.cartesian=TRUE]
output:
iid state start CURRENT_DATE
1: 1 E 2020-01-02 2020-01-02
2: 1 C_1 2020-01-03 2020-01-02
3: 1 C_2 2020-01-05 2020-01-02
4: 1 E 2020-01-06 2020-01-02
5: 1 E 2020-01-08 2020-01-02
6: 2 E 2020-01-10 2020-01-02
7: 1 E 2020-01-02 2020-01-03
8: 1 C_1 2020-01-03 2020-01-03
9: 1 C_2 2020-01-05 2020-01-03
10: 1 E 2020-01-06 2020-01-03
11: 1 E 2020-01-08 2020-01-03
12: 2 E 2020-01-10 2020-01-03
13: 1 E 2020-01-02 2020-01-04
14: 1 C_1 2020-01-03 2020-01-04
15: 1 C_2 2020-01-05 2020-01-04
16: 1 E 2020-01-06 2020-01-04
17: 1 E 2020-01-08 2020-01-04
18: 2 E 2020-01-10 2020-01-04
19: 1 E 2020-01-02 2020-01-05
20: 1 C_1 2020-01-03 2020-01-05
21: 1 C_2 2020-01-05 2020-01-05
22: 1 E 2020-01-06 2020-01-05
23: 1 E 2020-01-08 2020-01-05
24: 2 E 2020-01-10 2020-01-05
25: 1 E 2020-01-02 2020-01-06
26: 1 C_1 2020-01-03 2020-01-06
27: 1 C_2 2020-01-05 2020-01-06
28: 1 E 2020-01-06 2020-01-06
29: 1 E 2020-01-08 2020-01-06
30: 2 E 2020-01-10 2020-01-06
31: 1 E 2020-01-02 2020-01-07
32: 1 C_1 2020-01-03 2020-01-07
33: 1 C_2 2020-01-05 2020-01-07
34: 1 E 2020-01-06 2020-01-07
35: 1 E 2020-01-08 2020-01-07
36: 2 E 2020-01-10 2020-01-07
37: 1 E 2020-01-02 2020-01-08
38: 1 C_1 2020-01-03 2020-01-08
39: 1 C_2 2020-01-05 2020-01-08
40: 1 E 2020-01-06 2020-01-08
41: 1 E 2020-01-08 2020-01-08
42: 2 E 2020-01-10 2020-01-08
43: 1 E 2020-01-02 2020-01-09
44: 1 C_1 2020-01-03 2020-01-09
45: 1 C_2 2020-01-05 2020-01-09
46: 1 E 2020-01-06 2020-01-09
47: 1 E 2020-01-08 2020-01-09
48: 2 E 2020-01-10 2020-01-09
49: 1 E 2020-01-02 2020-01-10
50: 1 C_1 2020-01-03 2020-01-10
51: 1 C_2 2020-01-05 2020-01-10
52: 1 E 2020-01-06 2020-01-10
53: 1 E 2020-01-08 2020-01-10
54: 2 E 2020-01-10 2020-01-10
iid state start CURRENT_DATE
data:
library(data.table)
ex <- data.table(iid = c(1, 1, 1, 1, 1, 2),
state = c("E", "C_1", "C_2", "E", "E", "E"),
start = c("2020-01-02", "2020-01-03", "2020-01-05", "2020-01-06", "2020-01-08", "2020-01-10"))
ex[, start := as.IDate(start)]
Upvotes: 1