Reputation: 559
I am attempting to filter a database-backed tibble by piping to a filter()
command and am observing unexpected behavior:
If I filter using filter(pos == variable)
, I get the same result regardless of what value I assign to the variable. However, filtering with the value, e.g. filter(pos == 12345)
works - the result changes as it should for each different value I'm filtering on.
Is this an aspect of lazy evaluation or tidyeval? What is the correct way to filter()
a DB-backed tibble using a variable?
Here's a reproducible example:
library(dplyr)
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":dbname:")
ex_data <- tibble(
pos = c(10510138, 10510507),
ref = c("CATA", "TCA"),
alt = c("C", "T")
)
copy_to(con, ex_data, "variants", temporary = FALSE)
toQueryDB <- tbl(con, "variants")
pos = 10510138
(result <- toQueryDB %>% filter(pos == pos) %>% select(pos, ref, alt) %>% head(1))
# 10510138 CATA C
pos = 10510507
(result <- toQueryDB %>% filter(pos == pos) %>% select(pos, ref, alt) %>% head(1))
# STILL 10510138 CATA C !!!
(result <- toQueryDB %>% filter(pos == 10510138) %>% select(pos, ref, alt) %>% head(1))
# 10510138 CATA C
(result <- toQueryDB %>% filter(pos == 10510507) %>% select(pos, ref, alt) %>% head(1))
# 10510507 TCA T
DBI::dbDisconnect(con)
And my session info:
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.7.6
loaded via a namespace (and not attached):
[1] Rcpp_0.12.18 rstudioapi_0.7 bindr_0.1.1 magrittr_1.5 tidyselect_0.2.4 bit_1.1-14 R6_2.2.2
[8] rlang_0.2.2 fansi_0.3.0 blob_1.1.1 tools_3.5.1 utf8_1.1.4 cli_1.0.1 DBI_1.0.0
[15] dbplyr_1.2.2 yaml_2.2.0 bit64_0.9-7 assertthat_0.2.0 digest_0.6.17 tibble_1.4.2 crayon_1.3.4
[22] bindrcpp_0.2.2 purrr_0.2.5 memoise_1.1.0 glue_1.3.0 RSQLite_2.1.1 compiler_3.5.1 pillar_1.3.0
[29] pkgconfig_2.0.2
Upvotes: 3
Views: 512
Reputation: 5269
To elaborate on MrFlick's answer, you can also use !!
followed by parentheses for an in-place calculation instead of just evaluating an object, such as:
con %>%
tbl("some_table") %>%
filter(date > !!(Sys.Date() - 1000))
Longer, reproducible answer:
library(dplyr)
# we'll use duckdb here, of course
library(duckdb)
# connect, will be in-memory
db <- dbConnect(duckdb())
# create data set with 10k dates between 25k days ago and today
df <- data.frame(date = Sys.Date() + runif(10000, min = -25000, max = 0),
value = runif(10000))
# write to the duckdb
dbWriteTable(db, "test", df)
# I want all values from last 1000 days, though this won't work:
db %>%
tbl("test") %>%
filter(date > Sys.Date() - 1000)
#> Error in `collect()`:
#> ! Failed to collect lazy table.
#> ! rapi_prepare: Failed to prepare query SELECT *
#> FROM test
#> WHERE (date > (Sys.Date() - 1000.0))
#> LIMIT 11
#> Error: Catalog Error: Scalar Function with name date does not exist!
#> Did you mean "main.day"?
#> LINE 3: WHERE (date > (Sys.Date() - 1000.0))
#> ^
# now let the magic start, use !! with parentheses:
db %>%
tbl("test") %>%
filter(date > !!(Sys.Date() - 1000))
#> # Source: SQL [?? x 2]
#> # Database: DuckDB 0.8.1 [root@Darwin 22.6.0:R 4.3.1/:memory:]
#> date value
#> <date> <dbl>
#> 1 2022-05-28 0.809
#> 2 2021-10-11 0.267
#> 3 2023-01-09 0.657
#> 4 2021-12-05 0.0280
#> 5 2020-11-10 0.0411
#> 6 2021-12-11 0.0912
#> 7 2021-04-05 0.726
#> 8 2021-02-10 0.315
#> 9 2020-12-18 0.877
#> 10 2022-11-01 0.230
#> # ℹ more rows
#> # ℹ Use `print(n = ...)` to see more rows
Tada!
Upvotes: 0
Reputation: 206586
When you have
filter(pos == pos)
dplyr doesn't try to figure out which pos
is which and it assumes they both are coming from the data that's piped into the function. If you want to inject the value of a variable from outside the data being passed in, then you need to use the bang-bang rlang operator (!!
). You should use
filter(pos == !!pos)
Upvotes: 6