Reputation: 27
I am trying to load a dataframe (df) into snowflake. The table is created in snowflake and I am trying exactly what's written in the documentation.
I am doing the below:
df.show()
sfOptions = {
"sfURL" : "",
"sfAccount" : "",
"sfUser" : "",
"sfPassword" : "",
"sfDatabase" : "",
"sfSchema" : "",
"sfWarehouse" : "",
"sfRole" : "",
}
,(where appropriate values were kept in the variables)
SNOWFLAKE_SOURCE_NAME= "net.snowflake.spark.snowflake"
df.write.format(SNOWFLAKE_SOURCE_NAME).options(**sfOptions).option("dbtable", "<tablename>").mode('append').options(header=True).save()
I got this error:
**: java.lang.ClassNotFoundException: Failed to find data source: net.snowflake.spark.snowflake**
I added the snowflake-spark connector and the snowflake jdbc connector to PATH in environment variables and used that while creating the spark session as well. Still the issue persists.
I tried multiple routes but no luck. Any lead will be appreciated.
Upvotes: 1
Views: 5425
Reputation: 757
you need to add spark-snowflake and snowflake-jdbc packages while your running your pyspark command.
pyspark --packages net.snowflake:snowflake-jdbc:3.8.0,net.snowflake:spark-snowflake_2.11:2.4.14-spark_2.4
or if you have your jar files locally you can do
pyspark --py-files spark-snowflake.jar snowflake-jdbc.jar
or even more accurate in your python code.
spark.sparkContext.addPyFile("/path/to/jar/xxxx.jar")
Upvotes: 2