Reputation: 86
I would like to subset a data.table by specific date range, and I tried both between and inrange functions. I assume that %between% would give me the results. However, between produces an odd one. Here is the sample data - subset the sample data by two periods (2014-05-06 ~ 2014-05-14 and 2015-05-06 ~ 2015-05-14).
# Create a sample dataset
library(data.table)
set.seed(1)
DT <- data.table(Date = seq.Date(from = as.Date("2014-01-01"),
to = as.Date("2015-12-31"),
by = 1),
Value = sample(365 * 2))
# Define the lower and upper ranges for the subsetting periods
lower = c(as.Date("2014-05-06"), as.Date("2015-05-06"))
upper = c(as.Date("2014-05-14"), as.Date("2015-05-14"))
# Try between function
DT[Date %between% list(lower, upper)]
# Some odd result
Date Value
1: 2014-05-07 309
2: 2014-05-09 138
3: 2014-05-11 698
4: 2014-05-13 22
5: 2015-05-07 558
6: 2015-05-09 417
7: 2015-05-11 109
8: 2015-05-13 691
# Then try inrange function
DT[Date %inrange% list(lower, upper)]
# The results look good
Date Value
1: 2014-05-06 275
2: 2014-05-07 309
3: 2014-05-08 126
4: 2014-05-09 138
5: 2014-05-10 359
6: 2014-05-11 698
7: 2014-05-12 47
8: 2014-05-13 22
9: 2014-05-14 384
10: 2015-05-06 6
11: 2015-05-07 558
12: 2015-05-08 266
13: 2015-05-09 417
14: 2015-05-10 95
15: 2015-05-11 109
16: 2015-05-12 367
17: 2015-05-13 691
18: 2015-05-14 349
The inrange function produces the table I am after. by reading the data.table manual, I still not very clear about how between function works, particularly when lower and upper are provided outside DT as vectors. Could anyone give me some clue? Thank you.
Upvotes: 0
Views: 62
Reputation: 25225
Under Details section in ?between
,
From v1.9.8+,
between
is vectorised. lower and upper are recycled tolength(x)
if necessary.
Hence, in DT[Date %between% list(lower, upper)]
, it is more like
DT[Date %between% list(rep(lower, DT[,.N/length(lower)]), rep(upper, DT[,.N/length(upper)]))]
Whereas your understanding for inrange
is still correct, i.e.
inrange
checks whether each value in x is in between any of the intervals provided in lower,upper.
Upvotes: 1