mchen
mchen

Reputation: 10156

Inconsistent data.table assignment by reference behaviour

When assigning by reference with a data.table using a column from a second data.table, the results are inconsistent. When there are no matches by the key columns of both data.tables, it appears the assigment expression y := y is totally ignored - not even NAs are returned.

library(data.table)
dt1 <- data.table(id = 1:2, x = 3:4, key = "id")
dt2 <- data.table(id = 3:4, y = 5:6, key = "id")
print(dt1[dt2, y := y])
##    id x     # Would have also expected column:   y
## 1:  1 3     #                                   NA
## 2:  2 4     #                                   NA

However, when there is a partial match, non-matching columns have a placeholder NA.

dt2[, id := 2:3]
print(dt1[dt2, y := y])
##    id x  y
## 1:  1 3 NA    # <-- placeholder NA here
## 2:  2 4  5

This wreaks havoc on later code that assumes a y column exists in all cases. Otherwise I keep having to write cumbersome additional checks to take into account both cases.

Is there an elegant way around this inconsistency?

Upvotes: 14

Views: 365

Answers (2)

Arun
Arun

Reputation: 118799

With this recent commit, this issue, #759, is now fixed in v1.9.7. It works as expected when nomatch=NA (the current default).

require(data.table)
dt1 <- data.table(id = 1:2, x = 3:4, key = "id")
dt2 <- data.table(id = 3:4, y = 5:6, key = "id")
dt1[dt2, y := y][]
#    id x  y
# 1:  1 3 NA
# 2:  2 4 NA

Upvotes: 2

Tchotchke
Tchotchke

Reputation: 3121

Using merge works:

> dt3 <- merge(dt1, dt2, by='id', all.x=TRUE)
> dt3
   id x  y
1:  1 3 NA
2:  2 4 NA

Upvotes: 1

Related Questions