Reputation: 3720
library(data.table)
DT = data.table(iris)
The iris data as a data.table
str(DT)
> Classes ‘data.table’ and 'data.frame': 150 obs. of 5 variables:
> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
> $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1
> - attr(*, ".internal.selfref")=<externalptr>
This is just a simple function to add up numeric parts of iris by removing the factor column.
myfun = function(dt){
dt[,Species:=NULL]
return(sum(dt))
}
Run the function
myfun(DT)
> [1] 2078.7
Now DT is missing the Species column in the global environment
str(DT)
> Classes ‘data.table’ and 'data.frame': 150 obs. of 4 variables:
> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
> - attr(*, ".internal.selfref")=<externalptr>
Upvotes: 3
Views: 2073
Reputation: 263332
It's a duplicate, found by searching for: [r] select columns data.table
Any of these work:
> sum(DT[,!"Species"])
[1] 2078.7
> sum(DT[,1:4])
[1] 2078.7
> sum(DT[,-5])
[1] 2078.7
'Species' is still in DT.
Upvotes: 0
Reputation: 55350
data.table
works by reference. This is what makes it so fast and useful.
But this also means you have to be careful when passing arguments in functions. If you are not passing a copy, you will alter the original object.
myfun = function(dt){
# Use something like this
dt <- copy(dt) <~~~~~ KEY LINE
dt[,Species:=NULL]
return(sum(dt))
}
Alternatively, you could just call copy
when you call your function as so:
myfun(copy(DT))
But I think that leaves too much room for mistakes.
Upvotes: 3