George Sotiropoulos
George Sotiropoulos

Reputation: 2133

Filtering a data.table using two variables, an elegant fast way

I would like to ask you if there is a way to filter depending on the combination of more than one variable. To be more specific:

library(dplyr)
library(plyr)
library(data.table)

data <- iris %>% cbind( group = rep(c("a", "b", "c"), nrow(iris))) %>% as.data.table()

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species group
1:          5.1         3.5          1.4         0.2  setosa     a
2:          4.9         3.0          1.4         0.2  setosa     b
3:          4.7         3.2          1.3         0.2  setosa     c
4:          4.6         3.1          1.5         0.2  setosa     a
5:          5.0         3.6          1.4         0.2  setosa     b
6:          5.4         3.9          1.7         0.4  setosa     c

and i want to filter them based on the following datatable

filter <- data.table(Species = c("setosa", "versicolor", 'setosa'), group = c('a', "b", 'c'))
      Species group      filter1
1:     setosa     a     setosa a
2: versicolor     b versicolor b
3:     setosa     c     setosa c

I could do that in that way:

data[paste(Species, group) %in% filter[, filter1 := paste(Species, group)]$filter1]

However I would like to know if there is a way to do it more efficiently/faster/easier : something perhaps like:

data[.(Species, group) %in% filter] # does not work

Upvotes: 2

Views: 808

Answers (1)

Frank
Frank

Reputation: 66819

In this case, you can do

data[filter, on=names(filter), nomatch=0]

See Perform a semi-join with data.table for similar filtering joins.

Upvotes: 4

Related Questions