Joon
Joon

Reputation: 2147

Azure Event Hub maven package installation on Synapse

I am testing out the Spark capabilities in Azure Synapse Analytics as an alternative to DataBricks. I am trying to implement a Delta Lake job that works on DataBricks on Azure Synapse.

To receive messages from the event hub, I have the following pyspark code:

conf = {}
conf["eventhubs.connectionString"] = connectionString
read_df = (
  sc
    .readStream
    .format("eventhubs")
    .options(**conf)
    .load()
)

That code receives an error "java.lang.ClassNotFoundException: Failed to find data source: eventhubs", unless the maven package com.microsoft.azure:azure-eventhubs-spark_x:X is installed.

I am stuck on how to install that package.

I've tried adding it to a spark properties file called job_props.txt, with the following content:

spark.jars.packages com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.13

When I add this file to the spark cluster "Spark Config file" option, the cluster fails on startup with Livy process termination errors.

How can I install that Azure event hubs package for a PySpark job in Azure Synapse?

Upvotes: 0

Views: 650

Answers (1)

Joon
Joon

Reputation: 2147

Got an answer on this from our Microsoft account rep.

According to them, at this time you are not able to read from Kafka in Synapse Spark Pool like you do in Databricks. The problem is that although the synapse spark pool lets you load python libs, the Kafka python libs actually wrap the java libs which is not supported right now.

Upvotes: 1

Related Questions