For each value of one column, find which is the last value of another vector that is lower

Question

Finding the last position of a vector that is less than a given value is fairly straightforward (see e.g. this question

But, doing this line by line for a column in a data.frame or data.table is horribly slow. For example, we can do it like this (which is ok on small data, but not good on big data)

library(data.table)
set.seed(123)
x = sort(sample(20,5))
# [1]  6  8 15 16 17
y = data.table(V1 = 1:20)

y[, last.x := tail(which(x <= V1), 1), by = 1:nrow(y)]
#     V1 last.x
# 1:   1     NA
# 2:   2     NA
# 3:   3     NA
# 4:   4     NA
# 5:   5     NA
# 6:   6      1
# 7:   7      1
# 8:   8      2
# 9:   9      2
# 10: 10      2
# 11: 11      2
# 12: 12      2
# 13: 13      2
# 14: 14      2
# 15: 15      3
# 16: 16      4
# 17: 17      5
# 18: 18      5
# 19: 19      5
# 20: 20      5

Is there a fast, vectorised way to get the same thing? Preferably using data.table or base R.

Henrik · Accepted Answer

You may use findInterval

y[ , last.x := findInterval(V1, x)]

Slightly more convoluted using cut. But on the other hand, you get the NAs right away:

y[ , last.x := as.numeric(cut(V1, c(x, Inf), right = FALSE))]

For each value of one column, find which is the last value of another vector that is lower

Answers (2)

Related Questions