Reputation: 549
I'm trying to select lines from one table ("positons") with values for a particular column ("position") that fall within the ranges defined in another ("my_ranges") table, and then to add a grouping tag from the "my_ranges" table.
I can do this using tibbles and a couple purrr::map2
calls, but the same approach doesn't work with dbplyr database-tibbles. Is this expected behavior, and if so, is there a different approach that I should take to use dbplyr for this kind of task?
Here's my example:
library("tidyverse")
set.seed(42)
my_ranges <-
tibble(
group_id = c("a", "b", "c", "d"),
start = c(1, 7, 2, 25),
end = c(5, 23, 7, 29)
)
positions <-
tibble(
position = as.integer(runif(n = 100, min = 0, max = 30)),
annotation = stringi::stri_rand_strings(n = 100, length = 10)
)
# note: this works as I expect and returns a tibble with 106 obs of 3 variables:
result <- map2(.x = my_ranges$start, .y = my_ranges$end,
.f = function(x, y) {between(positions$position, x, y)}) %>%
map2(.y = my_ranges$group_id,
.f = function(x, y){
positions %>%
filter(x) %>%
mutate(group_id = y)}
) %>% bind_rows()
# next, make an in-memory db for testing:
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
# copy data to db
copy_to(con, my_ranges, "my_ranges", temporary = FALSE)
copy_to(con, positions, "positions", temporary = FALSE)
# get db-backed tibbles:
my_ranges_db <- tbl(con, "my_ranges")
positions_db <- tbl(con, "positions")
# note: this does not work as I expect, and instead returns a tibble with 0 obsevations of 0 variables:
# database range-based query:
db_result <- map2(.x = my_ranges_db$start, .y = my_ranges_db$end,
.f = function(x, y) {
between(positions_db$position, x, y)
}) %>%
map2(.y = my_ranges_db$group_id,
.f = function(x, y){
positions_db %>%
filter(x) %>%
mutate(group_id = y)}
) %>% bind_rows()
Upvotes: 3
Views: 1417
Reputation: 675
As long as each iteration creates a table of the same dimensions, then there may be a neat way of pushing the entire operation to the database. The idea is to use both map()
and reduce()
from purrr
. Each tbl_sql()
operation is lazy, so we can iterate through them without worrying of sending a bunch of queries, and then we can use union()
which will basically append the resulting SQL from each iteration to the next using the UNION
clause from the given database. Here's an example:
library(dbplyr, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
library(purrr, warn.conflicts = FALSE)
library(DBI, warn.conflicts = FALSE)
library(rlang, warn.conflicts = FALSE)
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":dbname:")
db_mtcars <- copy_to(con, mtcars)
cyls <- c(4, 6, 8)
all <- cyls %>%
map(~{
db_mtcars %>%
filter(cyl == .x) %>%
summarise(mpg = mean(mpg, na.rm = TRUE)
)
}) %>%
reduce(function(x, y) union(x, y))
all
#> # Source: lazy query [?? x 1]
#> # Database: sqlite 3.22.0 []
#> mpg
#> <dbl>
#> 1 15.1
#> 2 19.7
#> 3 26.7
show_query(all)
#> <SQL>
#> SELECT AVG(`mpg`) AS `mpg`
#> FROM (SELECT *
#> FROM (SELECT *
#> FROM `mtcars`)
#> WHERE (`cyl` = 4.0))
#> UNION
#> SELECT AVG(`mpg`) AS `mpg`
#> FROM (SELECT *
#> FROM (SELECT *
#> FROM `mtcars`)
#> WHERE (`cyl` = 6.0))
#> UNION
#> SELECT AVG(`mpg`) AS `mpg`
#> FROM (SELECT *
#> FROM (SELECT *
#> FROM `mtcars`)
#> WHERE (`cyl` = 8.0))
dbDisconnect(con)
Upvotes: 6
Reputation: 47300
dbplyr
translates R
into SQL
. Lists don't exist in SQL
. map
creates lists. Thus it's impossible to translate map
into SQL
.
Mainly dplyr
functions and some base
functions are translated, they're working on tidyr
functions too as I understood. When using dbplyr
try to have an SQL
logic in your approach or it will easily break.
Upvotes: 3