Phillip Black
Phillip Black

Reputation: 51

dplyr sql joins

Consider the following SQL:

    SELECT D.product_name
      FROM business.payment P
 LEFT JOIN dim.product_name D 
        ON D.product_id = P.product_id

The query returns the list of product_names that are in the payment table and does so based on a join.

How would something like this be replicated in dplyr without pulling into memory? I'm working with a database connection.

I tried the following, but to no avail:

product_name <- 
  business %>% 
  tbl('dim_product') 

business %>% 
  tbl('payment') %>% 
  left_join(product_name, by = 'product_id') %>% 
  select(product_name) %>% 
  collect()

I've searched quite a big and no one seems to have addressed this.

Thanks!

Upvotes: 2

Views: 1765

Answers (2)

Jordan Kassof
Jordan Kassof

Reputation: 103

This is a while after the fact but maybe you're still looking or curious for a dplyr verb based option. I was working on the same issue for my work and came across your (sort of) unanswered question. Below works for me when I run it against an MSSQL database using DBI and odbc packages.

I selected the columns of interest from the tables before joining as that is generally best practice when querying databases. dplyr join functions will do a natural join by default so you might not have to provide the by argument explicitly.

db_con <- DBI::dbConnect(
  drv = odbc::odbc(),
  dsn = <data source name>
)

db_con %>%
  tbl("table1") %>%
  select(col1, col2, col3) %>%
  left_join(
    db_con %>% tbl("table2") %>% select(col3,  col4, col5)
  )

Upvotes: 5

Hack-R
Hack-R

Reputation: 23231

You can use arbitrary SQL in dplyr:

tbl(my_data, sql("SELECT * FROM flights"))

Note this is no different than querying a DB through R via any other means, such as by RODBC.

Of course, the extracted data will always be in memory once it's imported into R via the query. The one way that it wouldn't be in memory at all is if you used RPostgres or RODBC, etc to send a SQL query and just used it to create a new table within the database and never exported the data to R.

Upvotes: 3

Related Questions