Reputation: 9771
The documentation says:
enable.auto.commit: Kafka source doesn’t commit any offset.
Hence my question is, in the event of a worker or partition crash/restart :
This is seems to be quite important. Any indication on how to deal with it ?
Upvotes: 1
Views: 846
Reputation: 2014
I also ran into this issue.
You're right in your observations on the 2 options i.e.
startingOffsets
is set to latest
startingOffsets
is set to earliest
However...
There is the option of checkpointing by adding the following option:
.writeStream
.<something else>
.option("checkpointLocation", "path/to/HDFS/dir")
.<something else>
In the event of a failure, Spark would go through the contents of this checkpoint directory, recover the state before accepting any new data.
I found this useful reference on the same.
Hope this helps!
Upvotes: 3