Chao Mu
Chao Mu

Reputation: 109

Connection from Spark to snowflake

I am writing this not for asking the question, but sharing the knowledge. I was using Spark to connect to snowflake. But I could not access snowflake. It seemed like there was something wrong with internal JDBC driver in databricks.

Here was the error I got.

java.lang.NoClassDefFoundError:net/snowflake/client/jdbc/internal/snowflake/common/core/S3FileEncryptionMaterial

I tried many versions of snowflake jdbc drivers and snowflake drivers. It seemed like I could match the correct one.

Upvotes: 7

Views: 5264

Answers (4)

Jie
Jie

Reputation: 1264

If you are using Databricks, there is a Databricks Snowflake connector created jointly by Databricks and Snowflake people. You just have to provide a few items to create a Spark dataframe (see below -- copied from the Databricks document).

# snowflake connection options
options = dict(sfUrl="<URL for your Snowflake account>",
               sfUser=user,
               sfPassword=password,
               sfDatabase="<The database to use for the session after connecting>",
               sfSchema="<The schema to use for the session after connecting>",
               sfWarehouse="<The default virtual warehouse to use for the session after connecting>")

df = spark.read \
  .format("snowflake") \
  .options(**options) \
  .option("dbtable", "<The name of the table to be read>") \
  .load()

display(df)

As long as you are accessing your own databases with all the access rights granted correctly, this only take a few minutes, even during our first attempt.

Good luck!

Upvotes: 5

Ankur Srivastava
Ankur Srivastava

Reputation: 923

You need to set the CLASSPATH Variables to point to jar like below. You need to set up SPARK_HOME and SCALA_HOME besides PYTHONPATH also.


export CLASSPATH=/snowflake-jdbc-3.8.0.jar:/spark-snowflake_2.11-2.4.14-spark_2.4.jar


You can also load in memory jars in your code to resolve this issue.


spark = SparkSession \ .builder \ .config("spark.jars", "file:///app/snowflake-jdbc-3.9.1.jar,file:///app/spark-snowflake_2.11-2.5.3-spark_2.2.jar") \ .config("spark.repl.local.jars", "file:///app/snowflake-jdbc-3.9.1.jar,file:///app/spark-snowflake_2.11-2.5.3-spark_2.2.jar") \ .config("spark.sql.catalogImplementation", "in-memory") \ .getOrCreate()


Upvotes: 1

Dennis Jaheruddin
Dennis Jaheruddin

Reputation: 21563

Answer as given by the asker (I just extracted it from the question for better site usability:


Step 1: Create cluster with Spark version - 2.3.0. and Scala Version - 2.11
Step 2: Attached snowflake-jdbc-3.5.4.jar to the cluster. https://mvnrepository.com/artifact/net.snowflake/snowflake-jdbc/3.5.4
Step 3: Attached spark-snowflake_2.11-2.3.2 driver to the cluster. https://mvnrepository.com/artifact/net.snowflake/spark-snowflake_2.11/2.3.2

Here is the sample code.

val SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"

val sfOptions = Map(
    "sfURL" -> "<snowflake_url>",
    "sfAccount" -> "<your account name>",
    "sfUser" -> "<your account user>",
    "sfPassword" -> "<your account pwd>",
    "sfDatabase" -> "<your database name>",
    "sfSchema" -> "<your schema name>",
    "sfWarehouse" -> "<your warehouse name>",
    "sfRole" -> "<your account role>",
    "region_id"-> "<your region name, if you are out of us region>"
)

val df: DataFrame = sqlContext.read
    .format(SNOWFLAKE_SOURCE_NAME)
    .options(sfOptions)
    .option("dbtable", "<your table>")
    .load()

Upvotes: 6

Sandy
Sandy

Reputation: 279

Please update to the latest version of the Snowflake JDBC driver (3.2.5); that should resolve this issue. Thanks!

Upvotes: -4

Related Questions