Reputation: 485
Am using Kafka-Storm integration. Kafka will load data to a queue and Kafka Spout will pull the data and processes. I have below design.
Kafka -> Queue -> KafkaSpout -> Process1 Bolt -> Process2 Bolt
Problem is, if Process2 Bolt is taking longer time to process the data the KafkaSpout is getting failed and again it tries to read data from queue this results in duplicate records.
If Bolt is processing slow why KafkaSpout is treating it as failed? what is the solution? is there any time-out or any similar properties i have to set in storm?
Upvotes: 2
Views: 1929
Reputation: 8171
Storm will fail a tuple if it takes too long to process, by default 30 seconds. Since Storm guarantees processing, once failed the Kafka spout will replay the same message until the tuple is successfully processed.
A tuple is considered failed when its tree of messages fails to be fully processed within a specified timeout. This timeout can be configured on a topology-specific basis using the Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS configuration and defaults to 30 seconds
Upvotes: 3