Reputation: 131
I'm using Apache Storm to process huge data coming off a Kafka spout. Currently, there are over 3k json messages already published to Kafka and it's continuing. I have to process all the messages published from beginning. So, I have set a Kafka spout parameter accordingly.
This results in a lot of failures in tuple processing. I got this info from the storm UI.
I suspect the storm is not able to handle all the messages bombarded towards it in a single shot.
Any help is appreciated.
Upvotes: 0
Views: 591
Reputation: 19
1) increase the parallelism hint for the bolts so that there's no backlog slowing down the processing for any tuple emitted by the spout, or
2) use the topology.max.spout.pending property to limit the number of tuples the spout can emit before having to wait for one of those tuples to complete.
try combination of both solutions. In production usually you need to run many iterations to get proper value of both the values (parallelism,topology.max.spout.pending)
Upvotes: 1