Michael
Michael

Reputation: 5898

Using .BY with a lookup table--unexpected results

I'd like to create a variable in dt according to a lookup table k. I'm getting some unexpected results depending on how I extract the variable of interest in k.

dt <- data.table(x=c(1:10))
setkey(dt, x)

k <- data.table(x=c(1:5,10), b=c(letters[1:5], "d"))
setkey(k, x)

dt[,b:=k[.BY, list(b)],by=x]

dt  #unexpected results
#      x  b
#  1:  1  1
#  2:  2  2
#  3:  3  3
#  4:  4  4
#  5:  5  5
#  6:  6  6
#  7:  7  7
#  8:  8  8
#  9:  9  9
# 10: 10 10

dt <- data.table(x=c(1:10))
setkey(x, x)

dt[,b:=k[.BY]$b,by=x]

dt  #expected results
#      x  b
#  1:  1  a
#  2:  2  b
#  3:  3  c
#  4:  4  d
#  5:  5  e
#  6:  6 NA
#  7:  7 NA
#  8:  8 NA
#  9:  9 NA
# 10: 10  d

Can anyone explain why this is happening?

Upvotes: 4

Views: 155

Answers (1)

Arun
Arun

Reputation: 118889

You don't have to use by=. here at all.

First solution:

Set appropriate keys and use X[Y] syntax from data.table:

require(data.table)
dt <- data.table(x=c(1:10))
setkey(dt, "x")
k <- data.table(x=c(1:5,10), b=c(letters[1:5], "d"))
setkey(k, "x")

k[dt]

#      x  b
#  1:  1  a
#  2:  2  b
#  3:  3  c
#  4:  4  d
#  5:  5  e
#  6:  6 NA
#  7:  7 NA
#  8:  8 NA
#  9:  9 NA
# 10: 10  d

OP said that this creates a new data.table and it is undesirable for him.

Second solution

Again, without by:

dt <- data.table(x=c(1:10))
setkey(dt, "x")
k <- data.table(x=c(1:5,10), b=c(letters[1:5], "d"))
setkey(k, "x")

# solution
dt[k, b := i.b]

This does not create a new data.table and gives the solution you're expecting.

To explain why the unexpected result happens:

For the first case you do, dt[,b:=k[.BY, list(b)],by=x]. Here, k[.BY, list(b)] itself returns a data.table. For example:

k[list(x=1), list(b)]

#    x b
# 1: 1 a

So, basically, if you would do:

k[list(x=dt$x), list(b)]

That would give you the desired solution as well. To answer why you get what you get when you do b := k[.BY, list(b)], since, the RHS returns a data.table and you're assigning a variable to it, it takes the first element and drops the rest. For example, do this:

dt[, c := dt[1], by=x] 
# you'll get the whole column to be 1

For the second case, to understand why it works, you'll have to know the subtle difference between, accessing a data.table as k[6] and k[list(6)], for example:

In the first case, k[6], you are accessing the 6th element of k, which is 10 d. But in the second case, you're asking for a J, join. So, it searches for x = 6 (key column) and since there isn't any in k, it returns 6 NA. In your case, since you use k[.BY] which returns a list, it is a J operation, which fetches the right value.

I hope this helps.

Upvotes: 3

Related Questions