Reputation: 891
How can I find the last value, prior to test.day, for each (loc.x
, loc.y
) pair?
dt <- data.table(
loc.x = as.integer(c(1, 1, 3, 1, 3, 1)),
loc.y = as.integer(c(1, 2, 1, 2, 1, 2)),
time = as.IDate(c("2015-03-11", "2015-05-10", "2015-09-27",
"2015-11-25", "2014-09-13", "2015-08-19")),
value = letters[1:6]
)
setkey(dt, loc.x, loc.y, time)
test.day <- as.IDate("2015-10-01")
Required output:
loc.x loc.y value
1: 1 1 a
2: 1 2 f
3: 3 1 c
Upvotes: 8
Views: 1473
Reputation: 92300
Here's another option using a rolling join after creating a lookup table
indx <- data.table(unique(dt[ ,.(loc.x, loc.y)]), time = test.day)
dt[indx, roll = TRUE, on = names(indx)]
# loc.x loc.y time value
# 1: 1 1 2015-10-01 a
# 2: 1 2 2015-10-01 f
# 3: 3 1 2015-10-01 c
Or a very similar option suggested by @eddi
dt[dt[, .(time = test.day), by = .(loc.x, loc.y)], roll = T, on = c('loc.x', 'loc.y', 'time')]
Or a one liner which will be less efficient as it will call [.data.table
by group
dt[,
.SD[data.table(test.day), value, roll = TRUE, on = c(time = "test.day")],
by = .(loc.x, loc.y)
]
# loc.x loc.y V1
# 1: 1 1 a
# 2: 1 2 f
# 3: 3 1 c
Upvotes: 6
Reputation: 70336
You can first subset the rows where time < test.day
(which should be quite efficient because it is not done by group) and then select the last value
per group. To do that you can either use tail(value, 1L)
or, as suggested by Floo0, value[.N]
, resulting in:
dt[time < test.day, tail(value, 1L), by = .(loc.x, loc.y)]
# loc.x loc.y V1
#1: 1 1 a
#2: 1 2 f
#3: 3 1 c
or
dt[time < test.day, value[.N], by = .(loc.x, loc.y)]
Note that this works because the data is sorted due to setkey(dt, loc.x, loc.y, time)
.
Upvotes: 6
Reputation: 83275
Another option is to use the last
function:
dt[, last(value[time < test.day]), by = .(loc.x, loc.y)]
which gives:
loc.x loc.y V1
1: 1 1 a
2: 1 2 f
3: 3 1 c
Upvotes: 6