Saurabh
Saurabh

Reputation: 1626

R Best way to delete data.table rows in a function call

I am looking for the best way to subset iris dataset in a function call. Here is the code -

data(iris)

remove_rows <- function(x)
{
  x = setDT(x)[Species == "virginica"]
}
remove_rows(iris)
> iris
     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
  1:          5.1         3.5          1.4         0.2    setosa
  2:          4.9         3.0          1.4         0.2    setosa
  3:          4.7         3.2          1.3         0.2    setosa
  4:          4.6         3.1          1.5         0.2    setosa
  5:          5.0         3.6          1.4         0.2    setosa
 ---                                                            
146:          6.7         3.0          5.2         2.3 virginica
147:          6.3         2.5          5.0         1.9 virginica
148:          6.5         3.0          5.2         2.0 virginica
149:          6.2         3.4          5.4         2.3 virginica
150:          5.9         3.0          5.1         1.8 virginica

As you can see, none of the rows are deleted after running remove_rows function. This is understandable as library data.table does not have the functionality to remove rows by reference. The workaround I have used is to update remove_rows function and return the new object from the function -

library(data.table)
remove_rows <- function(x)
{
  x= setDT(x)[Species == "virginica"]
  return(x)
}
iris = remove_rows(iris)

This has solved the problem, but since this data.table is huge in my case (iris is just a toy example), it takes a lot of time to run this function and copy the subset in iris dataset.

Is there a workaround to this situation?

Upvotes: 1

Views: 237

Answers (1)

jangorecki
jangorecki

Reputation: 16697

This is not yet implemented feature. Highly requested. You can track its progress in https://github.com/Rdatatable/data.table/issues/635

Function setsubset that you are about to test is not complete. It lacks the C part to set true length of object to a shorter than the original, so without actually adding that missing piece, it won't help you much. As is now, it will return a subset at the beginning of the data.table and remaining rows will be garbage.

For now you have to return new object from a function and assign it to (possibly) same variable as the one you are passing to the function. If you really don't want to do this you can always use assign to parent frame, but it is less elegant.

Upvotes: 3

Related Questions