aris
aris

Reputation: 11

How to get dbplyr in_schema to reference different warehouse

I'm using dbplyr to access a complex warehouse in snowflake, which contains multiple databases. I have write access to one of them, and read access to the rest. Sample structure

WH_a
 - schema_a 
   - table_aa
 - schema_b
   - table_ba
WH_b
 - schema_c
    - table_ca

Following dbplyr documentation, I set the working database and schema to be "WH_a.schema_a":

dbGetQuery(conn, "USE DATABASE WH_a")
dbGetQuery(conn, "USE SCHEMA schema_a")

and try to create table references. Simple table referencing works fine in the same schema:

aa <- tbl(conn, "table_aa") 

if I want to reference a table in a different schema (same WH_a), I can make use of in_schema() function with no problem:

ba <- tbl(conn, in_schema("schema_b", "table_ba")) 

However, I run into problems when I try to reference a table in a different warehouse.

ca <- tbl(conn, in_schema("WH_b.schema_c", "table_ca")) 

nanodbc/nanodbc.cpp:1374: 00000: SQL compilation error:
Schema 'WH_a."WH_b.schema_c"'  does not exist or not authorized.

Looks like the in_schema call inherits the current database and is not able to go one level above. I've been all over the docs, but most examples refer to much simpler databases where this is not an issue. Testing various combinations of setting and unsetting different schemas/warehouses did not result in success... Eventually I did find work around by passing a straight sql statement

ca <- tbl(conn, sql("SELECT * FROM WH_b.schema_c.table_ca")) 

However, this creates very ugly (and possibly inefficient) SQL code, where the select statement is inserted in brackets instead of just a table name. It is much harder to read and just does feel like the right thing to do in the long run

Is there a simpler/more efficient solution to this?

Many thanks

Upvotes: 1

Views: 467

Answers (1)

Fieldy
Fieldy

Reputation: 601

This maybe because, unlike Snowflake, quite a few databases don't support cross database joins so maybe not considered in the design of dbplyr.

You can use the sql() function to enclose the database.schema or database.schema.table strings which from the testing below seems to work.

# Create Databases, Schemas & Tables
dbGetQuery(conn, "CREATE DATABASE WH_A")
dbGetQuery(conn, "CREATE SCHEMA SCHEMA_A")
dbGetQuery(conn, "USE DATABASE WH_A")
dbGetQuery(conn, "USE SCHEMA SCHEMA_A")
dbGetQuery(conn, "CREATE TABLE TABLE_AA as 
                         select c1, c2
                          from (values (1, 'one'), (2, 'two')) as v1 (c1, c2);")
dbGetQuery(conn, "CREATE SCHEMA SCHEMA_B")
dbGetQuery(conn, "USE SCHEMA SCHEMA_B")
dbGetQuery(conn, "CREATE TABLE TABLE_BA as 
                         select c1, c2
                          from (values (1, 'one'), (2, 'two')) as v1 (c1, c2);")
dbGetQuery(conn, "CREATE DATABASE WH_B")
dbGetQuery(conn, "USE DATABASE WH_B")
dbGetQuery(conn, "CREATE SCHEMA SCHEMA_C")
dbGetQuery(conn, "USE SCHEMA SCHEMA_C")
dbGetQuery(conn, "SELECT CURRENT_DATABASE() as db, CURRENT_SCHEMA() as sc;")
dbGetQuery(conn, "CREATE TABLE TABLE_CA as 
                         select c1, c2
                          from (values (1, 'one'), (2, 'two')) as v1 (c1, c2);")
# Create tbl for TABLE_AA
dbGetQuery(conn, "USE DATABASE WH_A")
dbGetQuery(conn, "USE SCHEMA SCHEMA_A")
(aa <- tbl(conn, "TABLE_AA") )
aa$ops # Check database.schema.table reference

# Create tbl for TABLE_BA
(ba <- tbl(conn, in_schema("SCHEMA_B", "TABLE_BA")) )
ba$ops

# Create tbl for TABLE_CA
(ca <- tbl(conn, in_schema("WH_B.SCHEMA_C", "TABLE_CA")))       # Fails
(ca <- tbl(conn, in_schema(sql("WH_B.SCHEMA_C"), "TABLE_CA")))  # Works
ca$ops
# From: WH_B.SCHEMA_C."TABLE_CA"
# <Table: WH_B.SCHEMA_C."TABLE_CA">
ca <- tbl(conn, sql("WH_B.SCHEMA_C.TABLE_CA"))  # Also Works
ca$ops
# From: <derived table>
# <Table: WH_B.SCHEMA_C.TABLE_CA>

# Check cross schema join works
inner_join(aa,ba,by='C1')
# Check cross database join works  
inner_join(aa,ca,by='C1')

# Clean up 
dbGetQuery(conn, "DROP DATABASE WH_A")
dbGetQuery(conn, "DROP DATABASE WH_B")

If your database objects are using mixed case you will need to take care to quote them correctly within Sql()

dbGetQuery(conn, 'CREATE DATABASE "WH_b" ')
dbGetQuery(conn, 'USE DATABASE "WH_b" ')
dbGetQuery(conn, 'CREATE SCHEMA "schema_c" ')
dbGetQuery(conn, 'USE SCHEMA "schema_c" ')
dbGetQuery(conn, 'SELECT CURRENT_DATABASE() as db, CURRENT_SCHEMA() as sc;')
dbGetQuery(conn, 'CREATE TABLE "table_ca" as 
                         select c1, c2
                          from (values (1, \'one\'), (2, \'two\')) as v1 (c1, c2);')
dbGetQuery(conn, 'SELECT * FROM  "WH_b"."schema_c"."table_ca";')

(ca <- tbl(conn, sql(' "WH_b"."schema_c"."table_ca" ')))  # Quoted mixed case object names also Works
ca$ops

# Clean up 
dbGetQuery(conn, 'DROP DATABASE "WH_b" ')

Upvotes: 1

Related Questions