Xiaofeng Lu
Xiaofeng Lu

Reputation: 86

`%between%` (data.table) gives me a odd result

I would like to subset a data.table by specific date range, and I tried both between and inrange functions. I assume that %between% would give me the results. However, between produces an odd one. Here is the sample data - subset the sample data by two periods (2014-05-06 ~ 2014-05-14 and 2015-05-06 ~ 2015-05-14).

# Create a sample dataset    
library(data.table)
set.seed(1)
DT <- data.table(Date = seq.Date(from = as.Date("2014-01-01"),
                                 to = as.Date("2015-12-31"),
                                 by = 1),
                 Value = sample(365 * 2))

# Define the lower and upper ranges for the subsetting periods
lower = c(as.Date("2014-05-06"), as.Date("2015-05-06"))
upper = c(as.Date("2014-05-14"), as.Date("2015-05-14"))

# Try between function
DT[Date %between% list(lower, upper)]
# Some odd result
         Date Value
1: 2014-05-07   309
2: 2014-05-09   138
3: 2014-05-11   698
4: 2014-05-13    22
5: 2015-05-07   558
6: 2015-05-09   417
7: 2015-05-11   109
8: 2015-05-13   691

# Then try inrange function
DT[Date %inrange% list(lower, upper)]
# The results look good
          Date Value
 1: 2014-05-06   275
 2: 2014-05-07   309
 3: 2014-05-08   126
 4: 2014-05-09   138
 5: 2014-05-10   359
 6: 2014-05-11   698
 7: 2014-05-12    47
 8: 2014-05-13    22
 9: 2014-05-14   384
10: 2015-05-06     6
11: 2015-05-07   558
12: 2015-05-08   266
13: 2015-05-09   417
14: 2015-05-10    95
15: 2015-05-11   109
16: 2015-05-12   367
17: 2015-05-13   691
18: 2015-05-14   349

The inrange function produces the table I am after. by reading the data.table manual, I still not very clear about how between function works, particularly when lower and upper are provided outside DT as vectors. Could anyone give me some clue? Thank you.

Upvotes: 0

Views: 62

Answers (1)

chinsoon12
chinsoon12

Reputation: 25225

Under Details section in ?between,

From v1.9.8+, between is vectorised. lower and upper are recycled to length(x) if necessary.

Hence, in DT[Date %between% list(lower, upper)], it is more like

DT[Date %between% list(rep(lower, DT[,.N/length(lower)]), rep(upper, DT[,.N/length(upper)]))]

Whereas your understanding for inrange is still correct, i.e.

inrange checks whether each value in x is in between any of the intervals provided in lower,upper.

Upvotes: 1

Related Questions