Reputation: 581
I have a dataset that is structured as follows:
ID origin destination time
1 a b 2
2 b a 1
2 a c 4
3 c b 1
3 b c 3
I would like to turn this into a ID-time panel dataset like:
ID location time
1 a 1
1 b 2
1 b 3
1 b 4
2 a 1
2 a 2
2 a 3
2 c 4
3 b 1
3 b 2
3 c 3
3 c 4
So basically, I need to create the panel rows for when a subject doesn't change location, and fill in the location they are supposed to be at based on the info on origin and destinations. Is there any function in R that can do this smoothly? I'd prefer solutions using data.table or dplyr.
Upvotes: 4
Views: 195
Reputation: 49448
I don't think you need to do fancy joins for this problem:
maxt = max(dt$time)
dt[, .(location = c(rep(origin[1], time[1] - 1), rep(destination, diff(c(time, maxt + 1)))),
time = 1:maxt), by = ID]
# ID location time
# 1: 1 a 1
# 2: 1 b 2
# 3: 1 b 3
# 4: 1 b 4
# 5: 2 a 1
# 6: 2 a 2
# 7: 2 a 3
# 8: 2 c 4
# 9: 3 b 1
#10: 3 b 2
#11: 3 c 3
#12: 3 c 4
I've assumed that within a single ID next origin is the same as previous destination, as per the OP example.
Upvotes: 3
Reputation: 14370
A similar method to Frank's solution, but using two joins would be:
library(data.table)
res <- setDT(expand.grid(ID = unique(dt$ID), time = 1:4))
#Get origin
res[dt[,.(ID, origin, time = time - 1L)], location := origin, on = .(ID = ID, time = time)]
#Update origin and destination
res[dt, location := destination, on = c("ID", "time")][, location := zoo::na.locf(location), by = ID][order(ID, time)]
# ID time location
#1: 1 1 a
#2: 1 2 b
#3: 1 3 b
#4: 1 4 b
#5: 2 1 a
#6: 2 2 a
#7: 2 3 a
#8: 2 4 c
#9: 3 1 b
#10: 3 2 b
#11: 3 3 c
#12: 3 4 c
Upvotes: 3
Reputation: 66819
You could make a table with every time
for which you want to know the location
of each ID
:
newDT = DT[, CJ(ID = unique(ID), time = 1:4)]
Then put the original data in long format, inferring that
time-1
time
mDT = melt(DT, id = c("ID", "time"), value.name = "loc", variable.name = "loc_role")
mDT[loc_role == "origin", time := time - 1L]
mDT[, loc_role := NULL]
setorder(mDT, ID, time)
ID time loc
1: 1 1 a
2: 1 2 b
3: 2 0 b
4: 2 1 a
5: 2 3 a
6: 2 4 c
7: 3 0 c
8: 3 1 b
9: 3 2 b
10: 3 3 c
...and fill in the new table with a rolling update join:
newDT[, location := mDT[.SD, on=.(ID, time), roll=TRUE, x.loc]]
ID time location
1: 1 1 a
2: 1 2 b
3: 1 3 b
4: 1 4 b
5: 2 1 a
6: 2 2 a
7: 2 3 a
8: 2 4 c
9: 3 1 b
10: 3 2 b
11: 3 3 c
12: 3 4 c
(Dplyr doesn't have rolling or update joins yet, so I guess there's no analogue.)
How it works
CJ
takes the Cartesian product of some vectors, similar to expand.grid
melt
transforms to long form, keeping variables passed as id =
x[i, v := expr]
edits column v
of table x
on rows selected by i
setorder
sorts in place.SD
in j
of x[i,j]
refers to the subset of data (x
) selected by i
x[i, on=, roll=, expr]
is a rolling join, with rows selected by table i
, on=
and roll=
x.v
inside a join selects column v
from x
Regarding the last bullet, the prefix i.*
would do the same thing for columns from i
.
Upvotes: 4