William
William

Reputation: 131

Using Spark Thrift Server with Delta Lake

I'm trying to get a local spark cluster set up with delta lake on a set of raspberry pis and I'm having an issue when I try to connect to the thrift server.

I'm starting it like so:

$SPARK_HOME/sbin/start-thriftserver.sh \
    --master spark://{{ ansible_host }}:7077 \
    --packages io.delta:delta-spark_2.12:3.2.1 \
    --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \
    --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension

The issue comes when I try to connect with beeline.

I get this error from beeline:

Can't overwrite cause with java.lang.ClassNotFoundException: org.apache.spark.sql.delta.catalog.DeltaCatalog

And I get this one in the thrift server logs:

Cannot find catalog plugin class for catalog 'spark_catalog': org.apache.spark.sql.delta.catalog.DeltaCatalog.

The second error kind of implies to me that maybe spark.sql.catalog.spark_catalog isn't getting set correctly. The docs say that you can pass all the same options to the thriftserver startup script that you can pass to spark-submit though.

I've also included the same options in my spark-defaults:

spark.jars.packages io.delta:delta-spark_2.12:3.2.1
spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension
spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

I've also tried running spark-sql with all the same arguments and that works no problem and I'm able to read my delta table. It's specifically an issue with the thrift server.

I think the thrift server spins up a new spark context for each connection? My inclination is that that new spark context is missing some stuff when that happens. I don't know how to confirm or address that though.

Any thoughts?

Upvotes: 0

Views: 84

Answers (0)

Related Questions