Reputation: 143
I have a column of a data.table:
DT = data.table(R=c(3,8,5,4,6,7))
Further on I have a vector of upper cluster limits for the cluster 1, 2, 3 and 4:
CP=c(2,4,6,8)
Now I want to compare each entry of R with the elements of CP considering the order of CP. The result
DT[,NoC:=c(2,4,3,2,3,4)]
shall be a column NoC in DT, whose entries are just the number of that cluster, which the element of R belongs to. (I need the cluster number to choose a factor out of another data.table.)
For example take the 1st entry of R: 3 is not smaller than 2 (out of CP), but smaller than 4 (out of CP). So, 3 belongs to cluster 2.
Another exmaple, take the 6th entry of R: 7 is neither smaller than 2, 4 nor 6 (out of CP), but shmaller than 8 (out of CP). So, 7 belongs to cluster 4.
How can I do that without using if-clauses?
Upvotes: 3
Views: 1463
Reputation: 263499
From your description this would seem to be the code to deliver the correct answers, but Arun, a most skillful data.tablist, seems to have come up with a completely different way to fit your expectations, so I think there must be a different way of reading your requirements.
> DT[ , NoC:= findInterval(R, c(0, 2,4,6,8) , rightmost.closed=TRUE)]
> DT
R NoC
1: 3 2
2: 8 4
3: 5 3
4: 4 3
5: 6 4
6: 7 4
I'm also very puzzled that findInterval
is assigning the 5th item to the 4th interval since 6 is not greater than the upper boundary of the third interval (6).
Upvotes: 0
Reputation: 118889
You can accomplish this using rolling joins:
data.table(CP, key="CP")[DT, roll=-Inf, which=TRUE]
# [1] 2 4 3 2 3 4
roll=-Inf
performs a NOCB rolling join - Next Observation Carried Backward. That is, in the event of value falling in a gap, the next observation will be rolled backward. Ex: 7 falls between 6 and 8. The next value is 8 - will be rolled backward. We simply get the corresponding index of each match using which=TRUE
.
You can just add this as a column to DT
using :=
as you've shown.
Note that this will return the indices after ordering CP
. In your example, CP
is already ordered, so it returns the result as intended. If CP
is not already ordered, you'll have to add an additional column and extract that column instead of using which=TRUE
. But I'll leave it to you to work it out.
Upvotes: 4