Reputation: 1626
I am looking for the best way to subset iris
dataset in a function call. Here is the code -
data(iris)
remove_rows <- function(x)
{
x = setDT(x)[Species == "virginica"]
}
remove_rows(iris)
> iris
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1: 5.1 3.5 1.4 0.2 setosa
2: 4.9 3.0 1.4 0.2 setosa
3: 4.7 3.2 1.3 0.2 setosa
4: 4.6 3.1 1.5 0.2 setosa
5: 5.0 3.6 1.4 0.2 setosa
---
146: 6.7 3.0 5.2 2.3 virginica
147: 6.3 2.5 5.0 1.9 virginica
148: 6.5 3.0 5.2 2.0 virginica
149: 6.2 3.4 5.4 2.3 virginica
150: 5.9 3.0 5.1 1.8 virginica
As you can see, none of the rows are deleted after running remove_rows
function. This is understandable as library data.table
does not have the functionality to remove rows by reference.
The workaround I have used is to update remove_rows
function and return the new object from the function -
library(data.table)
remove_rows <- function(x)
{
x= setDT(x)[Species == "virginica"]
return(x)
}
iris = remove_rows(iris)
This has solved the problem, but since this data.table is huge in my case (iris is just a toy example), it takes a lot of time to run this function and copy the subset in iris dataset.
Is there a workaround to this situation?
Upvotes: 1
Views: 237
Reputation: 16697
This is not yet implemented feature. Highly requested. You can track its progress in https://github.com/Rdatatable/data.table/issues/635
Function setsubset
that you are about to test is not complete. It lacks the C part to set true length of object to a shorter than the original, so without actually adding that missing piece, it won't help you much. As is now, it will return a subset at the beginning of the data.table and remaining rows will be garbage.
For now you have to return new object from a function and assign it to (possibly) same variable as the one you are passing to the function. If you really don't want to do this you can always use assign
to parent frame, but it is less elegant.
Upvotes: 3