Spark 2.4.0 Structured Streaming Kafka Consumer Checkpointing

Question

I am using Spark 2.4.0 Structured Streaming (Batch Mode i.e. spark .read vs .readstream)to consume a Kafka topic. I am checkpointing read offsets and using the .option("startingOffsets", ...) to dictate where to continue reading on next job run.

In the docs is says Newly discovered partitions during a query will start at earliest. However testing showed that when a new partition is added and I use the last checkpoint info, I get the following error: Caused by: java.lang.AssertionError: assertion failed: If startingOffsets contains specific offsets, you must specify all TopicPartitions.

How can I check programmatically if any new partitions were created so that I can update my startingOffsets param?

Spark 2.4.0 Structured Streaming Kafka Consumer Checkpointing

Answers (1)

Related Questions