Reputation: 11
I'm using dbplyr to access a complex warehouse in snowflake, which contains multiple databases. I have write access to one of them, and read access to the rest. Sample structure
WH_a
- schema_a
- table_aa
- schema_b
- table_ba
WH_b
- schema_c
- table_ca
Following dbplyr documentation, I set the working database and schema to be "WH_a.schema_a":
dbGetQuery(conn, "USE DATABASE WH_a")
dbGetQuery(conn, "USE SCHEMA schema_a")
and try to create table references. Simple table referencing works fine in the same schema:
aa <- tbl(conn, "table_aa")
if I want to reference a table in a different schema (same WH_a), I can make use of in_schema() function with no problem:
ba <- tbl(conn, in_schema("schema_b", "table_ba"))
However, I run into problems when I try to reference a table in a different warehouse.
ca <- tbl(conn, in_schema("WH_b.schema_c", "table_ca"))
nanodbc/nanodbc.cpp:1374: 00000: SQL compilation error:
Schema 'WH_a."WH_b.schema_c"' does not exist or not authorized.
Looks like the in_schema call inherits the current database and is not able to go one level above. I've been all over the docs, but most examples refer to much simpler databases where this is not an issue. Testing various combinations of setting and unsetting different schemas/warehouses did not result in success... Eventually I did find work around by passing a straight sql statement
ca <- tbl(conn, sql("SELECT * FROM WH_b.schema_c.table_ca"))
However, this creates very ugly (and possibly inefficient) SQL code, where the select statement is inserted in brackets instead of just a table name. It is much harder to read and just does feel like the right thing to do in the long run
Is there a simpler/more efficient solution to this?
Many thanks
Upvotes: 1
Views: 467
Reputation: 601
This maybe because, unlike Snowflake, quite a few databases don't support cross database joins so maybe not considered in the design of dbplyr.
You can use the sql()
function to enclose the database.schema or database.schema.table strings which from the testing below seems to work.
# Create Databases, Schemas & Tables
dbGetQuery(conn, "CREATE DATABASE WH_A")
dbGetQuery(conn, "CREATE SCHEMA SCHEMA_A")
dbGetQuery(conn, "USE DATABASE WH_A")
dbGetQuery(conn, "USE SCHEMA SCHEMA_A")
dbGetQuery(conn, "CREATE TABLE TABLE_AA as
select c1, c2
from (values (1, 'one'), (2, 'two')) as v1 (c1, c2);")
dbGetQuery(conn, "CREATE SCHEMA SCHEMA_B")
dbGetQuery(conn, "USE SCHEMA SCHEMA_B")
dbGetQuery(conn, "CREATE TABLE TABLE_BA as
select c1, c2
from (values (1, 'one'), (2, 'two')) as v1 (c1, c2);")
dbGetQuery(conn, "CREATE DATABASE WH_B")
dbGetQuery(conn, "USE DATABASE WH_B")
dbGetQuery(conn, "CREATE SCHEMA SCHEMA_C")
dbGetQuery(conn, "USE SCHEMA SCHEMA_C")
dbGetQuery(conn, "SELECT CURRENT_DATABASE() as db, CURRENT_SCHEMA() as sc;")
dbGetQuery(conn, "CREATE TABLE TABLE_CA as
select c1, c2
from (values (1, 'one'), (2, 'two')) as v1 (c1, c2);")
# Create tbl for TABLE_AA
dbGetQuery(conn, "USE DATABASE WH_A")
dbGetQuery(conn, "USE SCHEMA SCHEMA_A")
(aa <- tbl(conn, "TABLE_AA") )
aa$ops # Check database.schema.table reference
# Create tbl for TABLE_BA
(ba <- tbl(conn, in_schema("SCHEMA_B", "TABLE_BA")) )
ba$ops
# Create tbl for TABLE_CA
(ca <- tbl(conn, in_schema("WH_B.SCHEMA_C", "TABLE_CA"))) # Fails
(ca <- tbl(conn, in_schema(sql("WH_B.SCHEMA_C"), "TABLE_CA"))) # Works
ca$ops
# From: WH_B.SCHEMA_C."TABLE_CA"
# <Table: WH_B.SCHEMA_C."TABLE_CA">
ca <- tbl(conn, sql("WH_B.SCHEMA_C.TABLE_CA")) # Also Works
ca$ops
# From: <derived table>
# <Table: WH_B.SCHEMA_C.TABLE_CA>
# Check cross schema join works
inner_join(aa,ba,by='C1')
# Check cross database join works
inner_join(aa,ca,by='C1')
# Clean up
dbGetQuery(conn, "DROP DATABASE WH_A")
dbGetQuery(conn, "DROP DATABASE WH_B")
If your database objects are using mixed case you will need to take care to quote them correctly within Sql()
dbGetQuery(conn, 'CREATE DATABASE "WH_b" ')
dbGetQuery(conn, 'USE DATABASE "WH_b" ')
dbGetQuery(conn, 'CREATE SCHEMA "schema_c" ')
dbGetQuery(conn, 'USE SCHEMA "schema_c" ')
dbGetQuery(conn, 'SELECT CURRENT_DATABASE() as db, CURRENT_SCHEMA() as sc;')
dbGetQuery(conn, 'CREATE TABLE "table_ca" as
select c1, c2
from (values (1, \'one\'), (2, \'two\')) as v1 (c1, c2);')
dbGetQuery(conn, 'SELECT * FROM "WH_b"."schema_c"."table_ca";')
(ca <- tbl(conn, sql(' "WH_b"."schema_c"."table_ca" '))) # Quoted mixed case object names also Works
ca$ops
# Clean up
dbGetQuery(conn, 'DROP DATABASE "WH_b" ')
Upvotes: 1