hive user
hive user

Reputation: 3

Extract DB name and table name from snowflake query using snowsql

Insert into D.d
Select * from A.a join B.b on
A.a.a1=B.b.b1 
Join C.c on C.c.c1=B.b.b1

I have complex statements for which i need to extract source db name ( in above statement source DB are A,B,C and source tables are a,b,c &Target Db is D and target table is d)

Need output like

SourceDB SourceTbl TargetDB Targettbl A,B,C a,b,c D d

Or we can get values in json format as well for each field.. Also this needs to accomodate for update and delete statements as well. Please assist

Thanks

Upvotes: 0

Views: 885

Answers (2)

Daniel Odievich
Daniel Odievich

Reputation: 161

Snowflake does not offer any SQL statement parsing support. You can hack at it with regex'es, of course, or use any of the tools on the market.

If this query ran, and ran successfully, you can use ACCESS_HISTORY view https://docs.snowflake.com/en/sql-reference/account-usage/access_history.html to see which tables (A.a, B.b, C.c, D.d) and columns (A.a.a1, B.b.b1, C.c.c1, D.d.d1) it accessed and how (read or write).

Upvotes: 0

Rajib Deb
Rajib Deb

Reputation: 1774

You can use the SQLPARSE to parse the statement. I am providing a code below which is not optimally and efficiently written, but it has the logic to get the information

import sqlparse
raw = 'Insert into D.d ' \
      'Select * from A.a join B.b on ' \
      'A.a.a1=B.b.b1  Join C.c on C.c.c1=B.b.b1;'

parsed = sqlparse.parse(raw)[0]

tgt_switch = "N"
src_switch = "N"
src_table=[]
tgt_table= ""
for items in parsed.tokens:
    #print(items,items.ttype)
    if str(items)  == "into":

        tgt_switch ="Y"
    if tgt_switch == "Y" and items.ttype is None:
        tgt_switch = "N"
        tgt_table = items

    if str(items).lower() == "from" or str(items).lower() == "join":
        src_switch = "Y"
    if src_switch == "Y" and items.ttype is None:
        src_switch = "N"
        src_table.append(str(items))

target_db = str(tgt_table).split(".")[0]
target_tbl = str(tgt_table).split(".")[1]
print("Target DB is {} and Target table is {}".format(target_db,target_tbl))
for obj in src_table:
    src_db = str(obj).split(".")[0]
    src_tbl = str(obj).split(".")[1]
    print("Source DB is {} and Source table is {}".format(src_db, src_tbl))

Upvotes: 1

Related Questions