geneorama
geneorama

Reputation: 3720

Should data.table operations have global scope even within function calls?

library(data.table)

DT = data.table(iris)

The iris data as a data.table

str(DT)
> Classes ‘data.table’ and 'data.frame':  150 obs. of  5 variables:
>  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... 
>  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
>  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... 
>  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... 
>  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1
>  - attr(*, ".internal.selfref")=<externalptr>

This is just a simple function to add up numeric parts of iris by removing the factor column.

myfun = function(dt){
    dt[,Species:=NULL]
    return(sum(dt))
}

Run the function

myfun(DT)  
> [1] 2078.7

Now DT is missing the Species column in the global environment

str(DT)
> Classes ‘data.table’ and 'data.frame':  150 obs. of  4 variables:
>  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
>  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
>  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
>  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
>  - attr(*, ".internal.selfref")=<externalptr>

Upvotes: 3

Views: 2073

Answers (2)

IRTFM
IRTFM

Reputation: 263332

It's a duplicate, found by searching for: [r] select columns data.table

Any of these work:

> sum(DT[,!"Species"])
[1] 2078.7
> sum(DT[,1:4])
[1] 2078.7
> sum(DT[,-5])
[1] 2078.7

'Species' is still in DT.

Upvotes: 0

Ricardo Saporta
Ricardo Saporta

Reputation: 55350

data.table works by reference. This is what makes it so fast and useful.

But this also means you have to be careful when passing arguments in functions. If you are not passing a copy, you will alter the original object.

myfun = function(dt){
    # Use something like this
    dt <- copy(dt)    <~~~~~ KEY LINE
    dt[,Species:=NULL]
    return(sum(dt))
}

Alternatively, you could just call copy when you call your function as so:

 myfun(copy(DT))

But I think that leaves too much room for mistakes.

Upvotes: 3

Related Questions