Timothée HENRY
Timothée HENRY

Reputation: 14604

R data.table J behavior

I am still puzzled by the behavior of data.table J.

> DT = data.table(A=7:3,B=letters[5:1])
> DT
   A B
1: 7 e
2: 6 d
3: 5 c
4: 4 b
5: 3 a
> setkey(DT, A, B)

> DT[J(7,"e")]
   A B
1: 7 e

> DT[J(7,"f")]
   A B
1: 7 f  # <- there is no such line in DT

but there is no such line in DT. Why do we get this result?

Upvotes: 6

Views: 162

Answers (2)

MattLBeck
MattLBeck

Reputation: 5831

The data.table J(7, 'f') is literally a single-row data.table that you are joining your own data.table with. When you call x[i], you are looking at each row in i and finding all matches for this in x. The default is to give NA for rows in i that don't match anything, which is easier seen by adding another column to DT:

DT <- data.table(A=7:3,B=letters[5:1],C=letters[1:5])
setkey(DT, A, B)
DT[J(7,"f")]
#    A B  C
# 1: 7 f NA

What you are seeing is the only row in J with no match to anything in DT. To prevent data.table from reporting non-matches, you can use nomatch=0

DT[J(7,"f"), nomatch=0]
# Empty data.table (0 rows) of 3 cols: A,B,C

Upvotes: 6

shadow
shadow

Reputation: 22293

Perhaps adding an additional column will shed some light on what is going on.

DT[, C:=paste0(A, B)]

DT[J(7,"e")]
###    A B  C
### 1: 7 e 7e

DT[J(7,"f")]
###    A B  C
### 1: 7 f NA

This is the same behavior as without J:

setkey(DT, B)

DT["a"]
###    B A  C
### 1: a 3 3a

DT["A"]
###    B  A  C
### 1: A NA NA

You can use the nomatch argument to change this behavior.

DT[J(7,"f"), nomatch=0L]
###  Empty data.table (0 rows) of 3 cols: A,B,C

Upvotes: 2

Related Questions