Reputation: 67
I want to process messages present in a Kafka topic using Kafka streams.
The last step of the processing is to put the result in a database table. To avoid database contention related issues(the program is going to run 24*7 and process millions of messages), I will be using batching for JDBC calls.
But in this case, there is a possibility of messages getting lost(in a scenario, I read 500 messages from a topic, streams will mark offset, now the program fails. Messages present in JDBC batch update are lost but the offset is marked for those messages).
I want to manually mark the offset of the last message once the database insert/update is complete, but it is not possible according to the following question: How to commit manually with Kafka Stream?.
Can someone please suggest any possible solution
Upvotes: 6
Views: 4020
Reputation: 3842
Kafka Stream doesn't support manual commit, and at the same time it doesn't support batch processing as well. With respect to your use case, there are few possibilities:
Use Normal consumer and implement batch processing and control manual offset.
Use Spark Kafka Structured stream as per below Kafka Spark Structured Stream
Try Spring Kafka [Spring Kafka]2
In this kind of scenario there are possibilities to consider JDBC Kafka Connector as well. Kafka JDBC Connector
Upvotes: 3
Reputation: 15087
As alluded to in @sun007's answer, I'd rather change your approach slightly:
This decoupling of processing (Kafka Streams) and ingestion (Kafka Connect) is typically a much more preferable design. For example, you no longer couple the processing step with the availability of the database: why should your KStreams application stop if the DB is down? That's an operational concern that shouldn't matter to the processing logic, where you certainly don't want to deal with timeouts, retries, and so on. (Even if you used a tool other than Kafka Streams for processing, this decoupling is still a preferable setup.)
Upvotes: 4