Reputation: 22518
I have followed the data.table introduction. A key is set on the x column of the data.table and then queried. I have tried to set the key on the v column and it does not work has expected. Any ideas of what I am doing wrong?
> set.seed(34)
> DT = data.table(x=c("b","b","b","a","a"),v=rnorm(5))
> DT
x v
1: b -0.1388900
2: b 1.1998129
3: b -0.7477224
4: a -0.5752482
5: a -0.2635815
> setkey(DT,v)
> DT[1.1998129,]
x v
1: b -0.7477224
EXPECTED:
x v
1: b 1.1998129
Upvotes: 3
Views: 120
Reputation: 49448
When the first argument of [.data.table
is a number, it will not do a join, but a simple row number lookup. Since after the setkey
your data.table
looks like so:
DT
# x v
#1: b -0.7477224
#2: a -0.5752482
#3: a -0.2635815
#4: b -0.1388900
#5: b 1.1998129
And since as.integer(1.1998129)
is equal to 1 you get the first row.
Now if you intended to do a join instead, you have to use the syntax DT[J(...)]
or DT[.(...)]
, and that will work as expected, provided you use the correct number (as a convenience, you're not required to use the J
when dealing with e.g. character columns, because there is no default meaning for what DT["a"]
would mean):
DT[J(v[5])]
# x v
#1: b 1.199813
Note that DT[J(1.1998129)]
will not work, because:
DT$v[5] == 1.1998129
#[1] FALSE
You could print out a lot of digits, and that would work:
options(digits = 22)
DT$v[5]
#[1] 1.199812896606383683107
DT$v[5] == 1.199812896606383683107
#[1] TRUE
DT[J(1.199812896606383683107)]
# x v
#1: b 1.199812896606383683107
but there is an additional subtlety here, worth noting, in that R and data.table
have different precisions for when floating point numbers are equal:
DT$v[5] == 1.19981289660638
#[1] FALSE
DT[J(1.19981289660638)]
# x v
#1: b 1.199812896606379908349
Long story short - be careful when joining floating point numbers.
Upvotes: 1