zaini
zaini

Reputation: 151

How to decode protobuf data with a Fabric PySpark notebook using the from_protobuf() method?

I am trying to stream event data from an Azure Event Hub using Spark Structured Streaming from within a Fabric Notebook to a lakehouse. The event data is protobuf and base64 encoded. I wanted to use the "from_protobuf()" method Protobuf Guide(Protobuf Data Source Guide - Spark 3.5.4 Documentation) to decode the payload.

However, I am getting the following error message:

Blockquote Spark Protobuf libraries not found in class path. Try one of the following. 1. Include the Protobuf library and its dependencies with in the spark-submit command as $ bin/spark-submit --packages org.apache.spark:spark-protobuf:3.4.3.5.3.20241016.1 ... 2. Download the JAR of the artifact from Maven Central http://search.maven.org/, Group Id = org.apache.spark, Artifact Id = spark-protobuf, Version = 3.4.3.5.3.20241016.1. Then, include the jar in the spark-submit command as $ bin/spark-submit --jars <spark-protobuf.jar> ...

Now my question is how do I do this in a Fabric notebook environment? Is there a way to include the mentioned library?

Also: I have two Python modules that contain the classes generated from the .proto schemas which are required to decode the payload. Where do I have to put these so that I can hand them to the from_protobuf() method?

Looking forward to any ideas on this! Thanks a lot and best, flo.

Upvotes: 0

Views: 31

Answers (0)

Related Questions