drymolasses
drymolasses

Reputation: 105

Snowpark with Python: AttributeError: 'NoneType' object has no attribute 'join'

I'm trying to use Snowpark & Python to transform and prep some data ahead of using it for some ML models. I've been able to easily use session.table() to access the data and select(), col(), filter(), and alias() to pick out the data I need. I'm now trying to join data from two different DataFrame objects, but running into an error.

My code to get the data:


import pandas as pd
df1 = read_session.table("<SCHEMA_NAME>.<TABLE_NAME>").select(col("ID"),
                             col("<col_name1>"),
                             col("<col_name2>"),
                             col("<col_name3>")          
                            ).filter(col("<col_name2>") == 'A1').show()

df2 = read_session.table("<SCHEMA_NAME>.<TABLE_NAME2>").select(col("ID"),
                             col("<col_name1>"),
                             col("<col_name2>"),
                             col("<col_name3>")          
                            ).show()

Code to join:

df_joined = df1.join(df2, ["ID"]).show()

Error: AttributeError: 'NoneType' object has no attribute 'join'

I have also used this method (from the Snowpark Python API documentation) and get the same error:

df_joined = df1.join(df2, df1.col("ID") == df2.col("ID")).select(df1["ID"], "<col_name1>", "<col_name2>").show()

I get similar errors when trying to convert to a DataFrame using pd.DataFrame and then trying to write it back to Snowflake to a new DB and Schema.

What am I doing wrong? Am I misunderstanding what Snowpark can do; isn't it part of the appeal that all these transformations can be easily done with the objects rather than as a full DataFrame? How can I get this to work?

Upvotes: 0

Views: 3116

Answers (1)

clb
clb

Reputation: 106

the primary issue is that you are assigning the output of a .show() method call to the variable, and not the Snowpark DF itself. It is best practice to assign the Snowpark dataframe itself to a variable, and then call .show() on that variable when you need to see the results.

Snowpark DF transformations are lazily executed, when you call .show(), you actually force execution, as opposed to hold a reference to the underlying data and transformations. So, for example:

df1 = read_session.table("<SCHEMA_NAME>.<TABLE_NAME>").select(col("ID"),
                             col("<col_name1>"),
                             col("<col_name2>"),
                             col("<col_name3>")          
                            ).filter(col("<col_name2>") == 'A1')
df1.show()

df2 = read_session.table("<SCHEMA_NAME>.<TABLE_NAME2>").select(col("ID"),
                             col("<col_name1>"),
                             col("<col_name2>"),
                             col("<col_name3>")          
                            )
df2.show()

df_joined = df1.join(df2, df1.col("ID") == df2.col("ID")).select(df1["ID"], "<col_name1>", "<col_name2>")
df_joined.show()

Otherwise, you are assigning a method call that returns NoneType to your variable, hence the error you are seeing: https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/_autosummary/snowflake.snowpark.html#snowflake.snowpark.DataFrame.show

Upvotes: 2

Related Questions