Reputation: 51
Consider the following SQL:
SELECT D.product_name
FROM business.payment P
LEFT JOIN dim.product_name D
ON D.product_id = P.product_id
The query returns the list of product_names that are in the payment table and does so based on a join.
How would something like this be replicated in dplyr without pulling into memory? I'm working with a database connection.
I tried the following, but to no avail:
product_name <-
business %>%
tbl('dim_product')
business %>%
tbl('payment') %>%
left_join(product_name, by = 'product_id') %>%
select(product_name) %>%
collect()
I've searched quite a big and no one seems to have addressed this.
Thanks!
Upvotes: 2
Views: 1765
Reputation: 103
This is a while after the fact but maybe you're still looking or curious for a dplyr
verb based option. I was working on the same issue for my work and came across your (sort of) unanswered question. Below works for me when I run it against an MSSQL database using DBI
and odbc
packages.
I selected the columns of interest from the tables before joining as that is generally best practice when querying databases. dplyr
join functions will do a natural join by default so you might not have to provide the by
argument explicitly.
db_con <- DBI::dbConnect(
drv = odbc::odbc(),
dsn = <data source name>
)
db_con %>%
tbl("table1") %>%
select(col1, col2, col3) %>%
left_join(
db_con %>% tbl("table2") %>% select(col3, col4, col5)
)
Upvotes: 5
Reputation: 23231
You can use arbitrary SQL in dplyr
:
tbl(my_data, sql("SELECT * FROM flights"))
Note this is no different than querying a DB through R via any other means, such as by RODBC
.
Of course, the extracted data will always be in memory once it's imported into R via the query. The one way that it wouldn't be in memory at all is if you used RPostgres
or RODBC
, etc to send a SQL query and just used it to create a new table within the database and never exported the data to R.
Upvotes: 3