cunningham1993
cunningham1993

Reputation: 1

Snowpark dataframe not returning with .collect() and will not allow looping through

Basically what I'm trying to do is use the info schema to generate a dataframe of drop commands that I would then want to iterate through and execute.

import snowflake.snowpark as snowpark

def main(session: snowpark.Session): 

    drop_commands = session.sql(""" SELECT 'DROP ' || table_type ||  ' if exists <database>.' || table_schema || '.' || table_name || ';' as drop_command
                                    FROM  <database>.INFORMATION_SCHEMA.TABLES 
                                    WHERE table_catalog = '<database>' 
                                        and table_schema in '<schema>'""").collect()
    
    for iter,row in df:
        session.sql(row['drop_command']).collect()

    # drop_commands.show()
    # Return value will appear in the Results tab.
    return drop_commands

I'm expecting that the drop_commands object will return as a data frame and then the loop will iterate through each drop command and drop snowflake object that I tell it to using the query. When I comment out the loop and get rid of the .collect() at the end of the drop_commands definition then I can see the data frame returned in the results... However, this is contradictory to the documentation that Snowflake has available (* .collect() executes the sql *). I've tried different versions of the loop as well because of reading about the differences between a snowpark and pandas data frame to no avail.

Note that I would like to have these snowpark issues addressed regardless of whether there is an easier way to purge certain tables/schemas/databases - however if anyone does know how to do that programmatically that'd be great feedback as well.

Upvotes: 0

Views: 3023

Answers (1)

Lukasz Szozda
Lukasz Szozda

Reputation: 176064

I'm expecting that the drop_commands object will return as a data frame

session.sql(""" """).collect()

First snowflake.snowpark.DataFrame.collect returns List[Row]

Second it is perfectly possible to run session.sql().collect() inside a loop:

import snowflake.snowpark as snowpark

def main(session: snowpark.Session):

    drop_commands = session.sql(""" SELECT 'DROP ' || REPLACE(table_type, 'BASE', '') ||  ' if exists TEST.' || table_schema || '.' || table_name || ';' as drop_command
                                    FROM  TEST.INFORMATION_SCHEMA.TABLES 
                                    WHERE table_catalog = 'TEST' 
                                        and table_schema in ('TEST')""").collect()
    
    for row in drop_commands:
        session.sql(row[0]).collect()

    return drop_commands

For env:

CREATE OR REPLACE DATABASE TEST;
CREATE OR REPLACE SCHEMA TEST;


CREATE TABLE t1(i INT);
CREATE TABLE t2(i INT);

SHOW TABLES IN SCHEMA TEST.TEST;
-- T1
-- T2

Output:

enter image description here

SHOW TABLES IN SCHEMA TEST.TEST;
-- Query produced no results

And finally: INFORMATION_SCHEMA.TABLES.TABLE_TYPE values are not only VIEW/TABLE so it cannot be concatenated with DROP DML without transfromation (BASE TABLE/EXTERNAL TABLE/VIEW/...).

Upvotes: 1

Related Questions