Abhirup
Abhirup

Reputation: 27

How to load a dataframe in PySpark to Snowflake

I am trying to load a dataframe (df) into snowflake. The table is created in snowflake and I am trying exactly what's written in the documentation.

I am doing the below:

df.show()

sfOptions = {
"sfURL"       : "",
"sfAccount"   : "",
"sfUser"      : "",
"sfPassword"  : "",
"sfDatabase"  : "",
"sfSchema"    : "",
"sfWarehouse" : "",
"sfRole"      : "",
}

,(where appropriate values were kept in the variables)

SNOWFLAKE_SOURCE_NAME= "net.snowflake.spark.snowflake"

df.write.format(SNOWFLAKE_SOURCE_NAME).options(**sfOptions).option("dbtable", "<tablename>").mode('append').options(header=True).save()

I got this error:

**: java.lang.ClassNotFoundException: Failed to find data source: net.snowflake.spark.snowflake**

I added the snowflake-spark connector and the snowflake jdbc connector to PATH in environment variables and used that while creating the spark session as well. Still the issue persists.

I tried multiple routes but no luck. Any lead will be appreciated.

Upvotes: 1

Views: 5425

Answers (1)

Yassine Abdul-Rahman
Yassine Abdul-Rahman

Reputation: 757

you need to add spark-snowflake and snowflake-jdbc packages while your running your pyspark command.

pyspark --packages net.snowflake:snowflake-jdbc:3.8.0,net.snowflake:spark-snowflake_2.11:2.4.14-spark_2.4

or if you have your jar files locally you can do

pyspark --py-files spark-snowflake.jar snowflake-jdbc.jar 

or even more accurate in your python code.

spark.sparkContext.addPyFile("/path/to/jar/xxxx.jar")

Upvotes: 2

Related Questions