JS G.
JS G.

Reputation: 158

Kafka with Spark 2.1 Structured Streaming - cannot deserialize

With Apache Spark version 2.1, I would like to use Kafka (0.10.0.2.5) as source for Structured Streaming with pyspark.

In the Kafka topic, I have json messages (pushed with Streamsets Data Collector). However, I am not able to read it with following code:

kafka=spark.readStream.format("kafka") \
.option("kafka.bootstrap.servers","localhost:6667") \
.option("subscribe","mytopic").load()
msg=kafka.selectExpr("CAST(value AS STRING)")
disp=msg.writeStream.outputMode("append").format("console").start()

It generates this error :

 java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArrayDeserializer

I tried to add at the readStream line:

.option("value.serializer","org.common.serialization.StringSerializer")
.option("key.serializer","org.common.serialization.StringSerializer")

But it does not solve the problem.

Any idea ? Thank you in advance.

Upvotes: 5

Views: 2841

Answers (1)

JS G.
JS G.

Reputation: 158

Actually I found the solution: I added the following jar in dependency:

spark-streaming-kafka-0-10-assembly_2.10-2.1.0.jar

(after having downloaded it from https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10-assembly_2.10/2.1.0)

Upvotes: 6

Related Questions