Reputation: 15708
Consider the two following two datasets, with a 'time
' column which represents a general timestamp, integers are used for simplicity of illustrating the example:
library(data.table)
df_test_1 <-
data.table(time = c(1:10, seq(20, 30, by = 5))) %>%
df_test_1$values <- -df_test_1$time
df_test_1 <- setkey(df_test_1, time)
df_test_2 <-
data.table(time = c(15, 20, 26, 28, 31))
df_test_2 <- setkey(df_test_2, time)
so that:
> df_test_1
time values
...
5: 5 -5
6: 6 -6
7: 7 -7
8: 8 -8
9: 9 -9
10: 10 -10
11: 20 -20
12: 25 -25
13: 30 -30
and:
> df_test_2
time
1: 15
2: 20
3: 26
4: 28
5: 31
The rolling join df_test_1[df_test_2, roll = -Inf]
produces:
> df_test_1[df_test_2, roll = -Inf]
time values
1: 15 -20
2: 20 -20
3: 26 -30
4: 28 -30
5: 31 NA
That is, for each time
value in df_test_1
, find all time
values in df_test_2
smaller than or equal to it, and associate the corresponding value
to this row of df_test_2
. For example, df_test_1$time == 20
matches the time values 15
and 20
in df_test_2$time
, thus the corresponding value of -20
is associated to these rows of df_test_2
.
I would like to change the join condition (in bold above) to smaller than it, that is, the answer produced should be:
time values
1: 15 -20
2: 20 -25
3: 26 -30
4: 28 -30
5: 31 NA
The difference here is that the value at df_test_1$time == 25
should be matched to the row of df_test_2
where df_test_2$time == 20
.
An alternate way of producing the desired result would be to take away a minuscule portion from time
:
df_test_3 <-
df_test_1 %>%
mutate(time = time - 0.1) %>%
setkey(time)
so that:
> df_test_3[df_test_2, roll = -Inf]
time values
1: 15 -20
2: 20 -25
3: 26 -30
4: 28 -30
5: 31 NA
Upvotes: 3
Views: 395
Reputation: 118799
Using the new non-equi
joins feature in the current development version of data.table, this is straightforward:
# v1.9.7+
df_test_1[df_test_2, on=.(time > time), mult="first"]
Keyed joins are capable of only equi joins. on
argument is essential for conditional joins.
Note that there's no need for data.tables to be keyed if the on
argument is used. Even if you wish to key the data.tables, specifying on
is better as it helps understand the code immediately at a later point.
See the installation instructions for devel version here.
Upvotes: 2