Reputation: 151
I am trying to stream event data from an Azure Event Hub using Spark Structured Streaming from within a Fabric Notebook to a lakehouse. The event data is protobuf and base64 encoded. I wanted to use the "from_protobuf()" method Protobuf Guide(Protobuf Data Source Guide - Spark 3.5.4 Documentation) to decode the payload.
However, I am getting the following error message:
Blockquote Spark Protobuf libraries not found in class path. Try one of the following. 1. Include the Protobuf library and its dependencies with in the spark-submit command as $ bin/spark-submit --packages org.apache.spark:spark-protobuf:3.4.3.5.3.20241016.1 ... 2. Download the JAR of the artifact from Maven Central http://search.maven.org/, Group Id = org.apache.spark, Artifact Id = spark-protobuf, Version = 3.4.3.5.3.20241016.1. Then, include the jar in the spark-submit command as $ bin/spark-submit --jars <spark-protobuf.jar> ...
Now my question is how do I do this in a Fabric notebook environment? Is there a way to include the mentioned library?
Also: I have two Python modules that contain the classes generated from the .proto schemas which are required to decode the payload. Where do I have to put these so that I can hand them to the from_protobuf() method?
Looking forward to any ideas on this! Thanks a lot and best, flo.
Upvotes: 0
Views: 31