Reputation: 158
With Apache Spark version 2.1, I would like to use Kafka (0.10.0.2.5) as source for Structured Streaming with pyspark.
In the Kafka topic, I have json messages (pushed with Streamsets Data Collector). However, I am not able to read it with following code:
kafka=spark.readStream.format("kafka") \
.option("kafka.bootstrap.servers","localhost:6667") \
.option("subscribe","mytopic").load()
msg=kafka.selectExpr("CAST(value AS STRING)")
disp=msg.writeStream.outputMode("append").format("console").start()
It generates this error :
java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArrayDeserializer
I tried to add at the readStream line:
.option("value.serializer","org.common.serialization.StringSerializer")
.option("key.serializer","org.common.serialization.StringSerializer")
But it does not solve the problem.
Any idea ? Thank you in advance.
Upvotes: 5
Views: 2841
Reputation: 158
Actually I found the solution: I added the following jar in dependency:
spark-streaming-kafka-0-10-assembly_2.10-2.1.0.jar
(after having downloaded it from https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10-assembly_2.10/2.1.0)
Upvotes: 6