Reputation: 37
I am using Spark Structured Streaming (Version 2.3.2). I need to read from Kafka Cluster and write into Kerberized Kafka. Here I want to use Kafka as offset checkpointing after the record is written into Kerberized Kafka.
Questions:
Please help.
Upvotes: 0
Views: 4514
Reputation: 18525
Can we use Kafka for checkpointing to manage offset
No, you cannot commit offsets back to your source Kafka topic. This is described in detail here and of course in the official Spark Structured Streaming + Kafka Integration Guide.
or do we need to use only HDFS/S3 only?
Yes, this has to be something like HDFS or S3. This is explained in section Recovering from Failures with Checkpointing of the StructuredStreaming Programming Guide: "This checkpoint location has to be a path in an HDFS compatible file system, and can be set as an option in the DataStreamWriter when starting a query."
Upvotes: 1