Reputation: 1902

filtering or subsetting a dataframe using multiple columns matching values in a list

I would like to select a subset of rows from a dataframe using the values stored in a list. Subsetting a dataframe is a common topic (e.g. this or this), but in all those questions the values are known prior to runtime. I want to use a named list generated in a shiny app to pick out rows and display them in a datatable object. I'll use dplyr::filter() but I think the same idea should work with subset() or [].

library(dplyr) 
data("mtcars")

# get all 6 cylinder cars with 3 gears 
filter(mtcars, cyl == 6 & gear == 3)
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 2 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

So far so good, but I have those values in a named list.

pickthese <- list(cyl = 6, gear = 3)

I've tried variations of paste() with parse() and eval(), and this works but seems clumsy.

eval(
  parse(
    text = paste("filter(mtcars, ",
                 paste(paste0(names(pickthese)[1]," == ", pickthese[[1]]),
                       paste0(names(pickthese)[2]," == ", pickthese[[2]]), sep = ","),
                 ")")))
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 2 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

I could turn the inner paste0() into a function and use *apply() or purrr::map to simplify some. Is there a better solution?

EDIT:

r2evans brought up a good point about what happens if the user screws up. I could think of two possible ways for this -- first, the user could select a value not in the dataframe, and second, the user could select a variable name not found in the dataframe. The second case shouldn't happen because the variable names in the app are set by me, but I'm not immune to screwing up either! I think the solution should either return a zero row data frame with the same columns (in first case), or cause an error.

# test case
lst <- list(cyl = 6.2, gear = 3, foo = "bar")

Created on 2018-07-17 by the reprex package (v0.2.0).

Upvotes: 1

Answers (4)

atiretoo

Reputation: 1902

I wish I could accept 3 answers! I'm learning so much here. I will combine bits of each of the above answers to get what I want in a very clear way.

nlst <- paste(names(pickthese), pickthese, sep="==") # from r2evans

mtcars %>% filter(!!! rlang::parse_exprs(nlst)) # from akrun

#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 2 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

I'm going to stick with this despite the concern about parsing, and also because of an issue with the merge() or inner_join() approach. All of the solutions work as expected if the value is not found in the data, returning zero row dataframes.

They do perform differently on the second issue, if the variable name is not part of the dataframe. Using merge() or inner_join() the erroneous variable name is added as a new column, even in a zero row dataframe. I just noticed another thing with merge() -- doesn't preserve the row or column order in the result. That wouldn't matter sometimes, but it does in this case because the result would display differently if I select different variables, and that would get annoying at least.

lst <- list(cyl = 6, gear = 3, foo = "bar") # 2nd problem

merge(mtcars, lst) # two rows, but silently adds column foo
#>   cyl gear  mpg disp  hp drat    wt  qsec vs am carb foo
#> 1   6    3 18.1  225 105 2.76 3.460 20.22  1  0    1 bar
#> 2   6    3 21.4  258 110 3.08 3.215 19.44  1  0    1 bar

inner_join(mtcars, as.data.frame(lst)) # two rows and silently adds column foo
#> Joining, by = c("cyl", "gear")
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb foo
#> 1 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1 bar
#> 2 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1 bar

nlst <- paste(names(lst), lst, sep="==")
mtcars %>% filter(!!! rlang::parse_exprs(nlst)) # error -- tells me I screwed up the programming.
#> Error in filter_impl(.data, quo): Evaluation error: object 'foo' not found.

Annnd, one more addition. semi_join() as suggested below is the winner in my actual use case, as something about the actual data.frame breaks the parsing solution that works here - I think it is character variables.

Upvotes: 0

moodymudskipper

Reputation: 47320

The simplest is base R:

l <- list(cyl = 6, gear = 3)
merge(mtcars,l)
#   cyl gear  mpg disp  hp drat    wt  qsec vs am carb
# 1   6    3 18.1  225 105 2.76 3.460 20.22  1  0    1
# 2   6    3 21.4  258 110 3.08 3.215 19.44  1  0    1

With dplyr we need first to convert your list to a tibble / data.frame and join it to the table with right_join or inner_join and we end up with @r2evans' solution.

Alternatively if we really want to filter, we can use filter_at with reduce :

library(tidyverse) # for dplyr and purrr
reduce(imap(l,~setNames(.x,.y)),
  ~filter_at(.x, names(.y),all_vars(.== .y)),
  .init=mtcars)
#    mpg cyl disp  hp drat    wt  qsec vs am gear carb
# 1 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# 2 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Upvotes: 2

akrun

Reputation: 887118

One option would be to create an expression that can be parsed

expr <- do.call(paste, c(stack(pickthese)[2:1], sep="==", collapse=";"))

or create with tidyverse

expr <- enframe(pickthese) %>% 
                unnest %>%
                reduce(paste, sep="==", collapse=";")


mtcars %>% 
    filter(!!! rlang::parse_exprs(expr))
#    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#1 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#2 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Upvotes: 1

r2evans

Reputation: 160447

More options, no parseing required.

lst <- list(cyl=6, gear=3)

df1 <- as.data.frame(lst)
mtcars %>% inner_join(df1)
# Joining, by = c("cyl", "gear")
#    mpg cyl disp  hp drat    wt  qsec vs am gear carb
# 1 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# 2 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

nlst <- paste(names(lst), lst, sep="==")
mtcars %>% filter_(.dots=nlst)
#    mpg cyl disp  hp drat    wt  qsec vs am gear carb
# 1 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# 2 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Upvotes: 2

filtering or subsetting a dataframe using multiple columns matching values in a list

Answers (4)

Related Questions