Zop
Zop

Reputation: 117

Pyspark and phoenix table

I would like to use the phoenix tables with Pyspark. I try the solution that i found here: https://phoenix.apache.org/phoenix_spark.html

But I have an error. Can you help me to solve this error?

df_metadata = sqlCtx.read.format("org.apache.phoenix.spark").option("zkUrl", "xxx").load("lib.name_of_table")
print(df_metadata.collect())

and the error:

py4j.protocol.Py4JJavaError: An error occurred while calling o103.load. : java.lang.ClassNotFoundException: Failed to find data source: org.apache.phoenix.spark. Please find packages at http://spark-packages.org

How can I use org.apache.phoenix.spark with pyspark?

Upvotes: 1

Views: 3692

Answers (2)

Nikhil JSK
Nikhil JSK

Reputation: 15

I know the answer given by @Zop works.

I've got this error py4j.protocol.Py4JJavaError: An error occurred while calling o53.load. : java.lang.ClassNotFoundException: Failed to find data source: org.apache.phoenix.spark. Please find packages at http://spark.apache.org/third-party-projects.html

You can do it this way too

spark-submit --jars /usr/hdp/current/phoenix-client/phoenix-spark2.jar,/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.4.0-91-client.jar,/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.4.0-91-server.jar  <file here>

Upvotes: 0

Zop
Zop

Reputation: 117

OK I found how correct this code: I add this part in my spark-submit: --jars /opt/phoenix-4.8.1-HBase-1.2/phoenix-spark-4.8.1-HBase-1.2.jar,/opt/phoenix-4.8.1-HBase-1.2/phoenix-4.8.1-HBase-1.2-client.jar \

Upvotes: 2

Related Questions