Reputation: 31
With the help of dbConnect, multiple connections were established with SQL DBs (say, DB1 and DB2). How can I write a query that involves tables from DB1 and DB2? Does dbGetQuery allow querying one only one DB? Can sqldf package be leveraged after the DB connections have been made?
Upvotes: 3
Views: 1557
Reputation: 1144
This isn't the answer you're looking for, but I've had the same problem.
In short, I would drop the idea of doing any joins/grouping/subquerys between tables in 1 (or more) DBs in SQL. With the newer big data packages in R, specifically with dplyr or data.table there's truly almost no need. The only exception I can think of where SQL is faster is when your query results are large enough to take up too much RAM.
An interesting use-case for me is the following: My tables coming from an MPP database are around 20B rows. Problem: Query an entire result set of 2M rows, and use dplyr::group_by() to group on 3 variables, or just do the GROUP BY in SQL to return the final result of 100k rows.
Timing wise, there's always a tipping point where R or SQL is faster, and except for maybe a dimension table join in MySQL, R is almost always faster for everything. (My example is on the tipping point for my hardware.)
With dplyr as easy to use as SQL, I'm not sure we need to ask this question anymore.
Upvotes: 0