Reputation: 1902
I would like to select a subset of rows from a dataframe using the
values stored in a list. Subsetting a dataframe is a common topic (e.g.
this or this),
but in all those questions the values are known prior to runtime. I want
to use a named list generated in a shiny app to pick out rows and
display them in a datatable object. I'll use dplyr::filter()
but I
think the same idea should work with subset()
or []
.
library(dplyr)
data("mtcars")
# get all 6 cylinder cars with 3 gears
filter(mtcars, cyl == 6 & gear == 3)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> 2 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
So far so good, but I have those values in a named list.
pickthese <- list(cyl = 6, gear = 3)
I've tried variations of paste()
with parse()
and eval()
, and this
works but seems clumsy.
eval(
parse(
text = paste("filter(mtcars, ",
paste(paste0(names(pickthese)[1]," == ", pickthese[[1]]),
paste0(names(pickthese)[2]," == ", pickthese[[2]]), sep = ","),
")")))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> 2 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
I could turn the inner paste0()
into a function and use *apply()
or
purrr::map
to simplify some. Is there a better solution?
EDIT:
r2evans brought up a good point about what happens if the user screws up. I could think of two possible ways for this -- first, the user could select a value not in the dataframe, and second, the user could select a variable name not found in the dataframe. The second case shouldn't happen because the variable names in the app are set by me, but I'm not immune to screwing up either! I think the solution should either return a zero row data frame with the same columns (in first case), or cause an error.
# test case
lst <- list(cyl = 6.2, gear = 3, foo = "bar")
Created on 2018-07-17 by the reprex package (v0.2.0).
Upvotes: 1
Views: 132
Reputation: 1902
I wish I could accept 3 answers! I'm learning so much here. I will combine bits of each of the above answers to get what I want in a very clear way.
nlst <- paste(names(pickthese), pickthese, sep="==") # from r2evans
mtcars %>% filter(!!! rlang::parse_exprs(nlst)) # from akrun
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> 2 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
I'm going to stick with this despite the concern about parsing, and also because of an issue with the merge()
or inner_join()
approach. All of the solutions work as expected if the value is not found in the data, returning zero row dataframes.
They do perform differently on the second issue, if the variable name is not part of the dataframe. Using merge()
or inner_join()
the erroneous variable name is added as a new column, even in a zero row dataframe. I just noticed another thing with merge()
-- doesn't preserve the row or column order in the result. That wouldn't matter sometimes, but it does in this case because the result would display differently if I select different variables, and that would get annoying at least.
lst <- list(cyl = 6, gear = 3, foo = "bar") # 2nd problem
merge(mtcars, lst) # two rows, but silently adds column foo
#> cyl gear mpg disp hp drat wt qsec vs am carb foo
#> 1 6 3 18.1 225 105 2.76 3.460 20.22 1 0 1 bar
#> 2 6 3 21.4 258 110 3.08 3.215 19.44 1 0 1 bar
inner_join(mtcars, as.data.frame(lst)) # two rows and silently adds column foo
#> Joining, by = c("cyl", "gear")
#> mpg cyl disp hp drat wt qsec vs am gear carb foo
#> 1 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 bar
#> 2 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 bar
nlst <- paste(names(lst), lst, sep="==")
mtcars %>% filter(!!! rlang::parse_exprs(nlst)) # error -- tells me I screwed up the programming.
#> Error in filter_impl(.data, quo): Evaluation error: object 'foo' not found.
Annnd, one more addition. semi_join()
as suggested below is the winner in my actual use case, as something about the actual data.frame breaks the parsing solution that works here - I think it is character variables.
Upvotes: 0
Reputation: 47320
The simplest is base R
:
l <- list(cyl = 6, gear = 3)
merge(mtcars,l)
# cyl gear mpg disp hp drat wt qsec vs am carb
# 1 6 3 18.1 225 105 2.76 3.460 20.22 1 0 1
# 2 6 3 21.4 258 110 3.08 3.215 19.44 1 0 1
With dplyr
we need first to convert your list to a tibble
/ data.frame
and join it to the table with right_join
or inner_join
and we end up with @r2evans' solution.
Alternatively if we really want to filter, we can use filter_at
with reduce
:
library(tidyverse) # for dplyr and purrr
reduce(imap(l,~setNames(.x,.y)),
~filter_at(.x, names(.y),all_vars(.== .y)),
.init=mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# 2 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Upvotes: 2
Reputation: 887118
One option would be to create an expression that can be parsed
expr <- do.call(paste, c(stack(pickthese)[2:1], sep="==", collapse=";"))
or create with tidyverse
expr <- enframe(pickthese) %>%
unnest %>%
reduce(paste, sep="==", collapse=";")
mtcars %>%
filter(!!! rlang::parse_exprs(expr))
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#2 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Upvotes: 1
Reputation: 160447
More options, no parse
ing required.
lst <- list(cyl=6, gear=3)
df1 <- as.data.frame(lst)
mtcars %>% inner_join(df1)
# Joining, by = c("cyl", "gear")
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# 2 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
nlst <- paste(names(lst), lst, sep="==")
mtcars %>% filter_(.dots=nlst)
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# 2 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Upvotes: 2