Reputation: 785
Consider a data.table
dt = data.table(id = rep(c('a','b'), each=2),
val = rep(c(1,2,3), times=c(1,2,1)))
# > dt
# id val
# 1: a 1
# 2: a 2
# 3: b 2
# 4: b 3
that we want to subset by id
.
If we key by that column alone, no problem.
setkey(dt, id)
dt[J('a'), val]
# id val
# 1: a 1
# 2: a 2
dt[J('a'), range(val)]
# id V1
# 1: a 1
# 2: a 2
But if dt
happens to be keyed also by the numeric column val
, then that extra key column no longer seems to work in j
.
setkey(dt, id, val)
dt[J('a'), val]
# id val
# 1: a 1
dt[J('a'), range(val)]
# id V1
# 1: a 1
# 2: a 1
## I would have expected same results here as when key(dt) == "id" only
Some values seem to be missing now...
unless we resort to vector scan (which can be slow, and returns vectors here)
dt[id == 'a', val]
# [1] 1 2
dt[id == 'a', range(val)]
# [1] 1 2
or unless we explicitly set by
(which throws a warning).
dt[J('a'), range(val), by = id]
# id V1
# 1: a 1
# 2: a 2
# Warning message:
# In `[.data.table`(dt, J("a"), range(val), by = id) :
# by is not necessary in this query; it equals all the join columns
# in the same order. j is already evaluated by group of x that each
# row of i matches to (by-without-by, see ?data.table). Setting by
# will be slower because a subset of x is taken and then grouped
# again. Consider removing by, or changing it.
What's going on please?
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] data.table_1.9.2
loaded via a namespace (and not attached):
[1] plyr_1.8.1 Rcpp_0.11.0 reshape2_1.2.2 stringr_0.6.2
[5] tools_3.0.1
Upvotes: 3
Views: 158
Reputation: 118799
Added tests (1351.1 and 1351.2) to catch any future regressions on particular case of binary search based subset reported here on SO. Thanks to Scott for the post. The regression was contained to v1.9.2 AFAICT. Closes #734.
Scott, thanks for the report (and the follow-up comment). It seems to have occurred in 1.9.2 alone. I tested it on the current development version v1.9.3 and things seem to work as intended. Please check the README file for installation instructions.
I've added a issue #734 to remind us to add a test to cover this usage so that we don't miss it again during any changes in the future.
Upvotes: 1