newbie
newbie

Reputation: 907

Map Value based on Specified Intervals

Let's say we have the following data.table:

library(data.table)
dt <- data.table(x=c(-0.01, -0.001, 0, 0.01,0.02,0.03,1,3,4,10,20,25), value=c(rep(1,3), rep(2, 3), rep(3, 2), rep(4, 2), rep(5, 2)))

Then, I specify intervals using the following:

library(Hmisc)
dt[, int:=cut2(x, cuts=c(-Inf, 0.01, 1, 4, 20, Inf))]

return,

         x value           int
 1: -0.010     1 [ -Inf, 0.01)
 2: -0.001     1 [ -Inf, 0.01)
 3:  0.000     1 [ -Inf, 0.01)
 4:  0.010     2 [ 0.01, 1.00)
 5:  0.020     2 [ 0.01, 1.00)
 6:  0.030     2 [ 0.01, 1.00)
 7:  1.000     3 [ 1.00, 4.00)
 8:  3.000     3 [ 1.00, 4.00)
 9:  4.000     4 [ 4.00,20.00)
10: 10.000     4 [ 4.00,20.00)
11: 20.000     5 [20.00,  Inf]
12: 25.000     5 [20.00,  Inf]

Assuming there is new data.table

dtnew <- data.table(x=c(-0.001, 0.4, 0.3, 5, 25))

        x
1: -0.001
2:  0.400
3:  0.300
4:  5.000
5: 25.000

What I want is to match values dtnew to intervals in dt so that I can obtain value in dt. The expected return would be

        x value           int
1: -0.001     1 [ -Inf, 0.01)
2:  0.400     2 [ 0.01, 1.00)
3:  0.300     2 [ 0.01, 1.00)
4:  5.000     4 [ 4.00,20.00)
5: 25.000     5 [20.00,  Inf]

I have tried to assign intervals to dtnew so that I can merge(dtnew, dt, by = 'int') using the following:

dtnew[, int:=cut2(y, cuts=c(-Inf, 0.01, 1, 4, 20, Inf))] which intervals are the same as used in dt. However, the interval in 1st row is different from dt as follows:

1: -0.001          -Inf
2:  0.400 [ 0.01, 1.00)
3:  0.300 [ 0.01, 1.00)
4:  5.000 [ 4.00,20.00)
5: 25.000 [20.00,  Inf]

Could you please give me suggestions?

Upvotes: 2

Views: 279

Answers (1)

Arun
Arun

Reputation: 118799

This is a simple rolling join:

dt[dtnew, on="x", roll=Inf]
#     x     value           int
# 1: -0.001     1 [ -Inf, 0.01)
# 2:  0.400     2 [ 0.01, 1.00)
# 3:  0.300     2 [ 0.01, 1.00)
# 4:  5.000     4 [ 4.00,20.00)
# 5: 25.000     5 [20.00,  Inf]

For each row in dtnew, the matching rows in dt is found by looking up on x. If a particular value from dtnew has no match, but rather it falls in a gap, i.e, between two values in dt, then carry over the last/previous value forward.

E.g., for dtnew$x = 0.4, it falls in between row 6 0.03 and row 7 1.00. And therefore the previous row (=6) is carried forward as the matching row.

Upvotes: 4

Related Questions