Reputation: 31454
Finding the last position of a vector that is less than a given value is fairly straightforward (see e.g. this question
But, doing this line by line for a column in a data.frame or data.table is horribly slow. For example, we can do it like this (which is ok on small data, but not good on big data)
library(data.table)
set.seed(123)
x = sort(sample(20,5))
# [1] 6 8 15 16 17
y = data.table(V1 = 1:20)
y[, last.x := tail(which(x <= V1), 1), by = 1:nrow(y)]
# V1 last.x
# 1: 1 NA
# 2: 2 NA
# 3: 3 NA
# 4: 4 NA
# 5: 5 NA
# 6: 6 1
# 7: 7 1
# 8: 8 2
# 9: 9 2
# 10: 10 2
# 11: 11 2
# 12: 12 2
# 13: 13 2
# 14: 14 2
# 15: 15 3
# 16: 16 4
# 17: 17 5
# 18: 18 5
# 19: 19 5
# 20: 20 5
Is there a fast, vectorised way to get the same thing? Preferably using data.table or base R.
Upvotes: 3
Views: 97
Reputation: 67818
You may use findInterval
y[ , last.x := findInterval(V1, x)]
Slightly more convoluted using cut
. But on the other hand, you get the NA
s right away:
y[ , last.x := as.numeric(cut(V1, c(x, Inf), right = FALSE))]
Upvotes: 2
Reputation: 992
Pretty simple in base R
x<-c(6L, 8L, 15L, 16L, 17L)
y<-1:20
cumsum(y %in% x)
[1] 0 0 0 0 0 1 1 2 2 2 2 2 2 2 3 4 5 5 5 5
Upvotes: -1