BenoitLondon
BenoitLondon

Reputation: 907

data.table modifies parent environment / weird behavior with setDT

So if I build my data.table with a data.frame of existing vectors and setDT, the original vector get modified in the parent environment:

a <- 1:2 / 2
x <- 1:10 / 2
y <- 11/2
dt <- data.frame(a, x, y)
setDT(dt)
dt[ , cond := a == 1]
dt[(cond), c("x", "y") := list(y, x)]
x
#[1] 0.5 5.5 1.5 5.5 2.5 5.5 3.5 5.5 4.5 5.5

For Info I use R 3.5.1 and data.table 1.11.4

If I use data.table constructor instead of data.frame + setDT it does not modify the vector x.

a <- 1:2 / 2
x <- 1:10 / 2
y <- 11/2
dt <- data.table(a, x, y)
dt[ , cond := a == 1]
dt[(cond), c("x", "y") := list(y, x)]
x
#[1] 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Can somebody explain what's happening to me and if it's a bug?

Cheers

EDIT1: just found this related issue on github https://github.com/Rdatatable/data.table/issues/2683

EDIT2: the suspect was obviously "copy by reference" such that the memory addresses of the vectors x and dt$x are the same, hence it modifies the vector outside the data.table. I would have thought the data.frame creation would have made a copy...

> a <- 1:2 / 2
> x <- 1:10 / 2
> y <- 11/2
> dt <- setDT(as.data.frame(list(a = a, x = x, y = y)))
> dt[ , cond := a == 1]
> dt[(cond), c("x", "y") := list(y, x)]
> x
[1] 0.5 5.5 1.5 5.5 2.5 5.5 3.5 5.5 4.5 5.5
> address(dt$x)
[1] "0xadd8fe8"
> address(x)
[1] "0xadd8fe8"

Upvotes: 7

Views: 237

Answers (1)

Arun
Arun

Reputation: 118839

setDT modifies input object by reference. If the object being used as input is itself created by performing a shallow copy (as opposed to a deep copy), then all such objects will be modified while using := or set() from data.table.

data.frame() seems to be creating shallow copies of input objects upon creation wherever possible to be more efficient. So address(df$x) and address(x) are identical. That's acceptable since R performs a copy-on-modify.

You can avoid such scenarios by creating data.tables directly. If instead, a data.frame object is directly given to you, and you've no idea about how it was created, better to use copy(). HTH.

Upvotes: 9

Related Questions