Fangchao Gao
Fangchao Gao

Reputation: 1

Storm message failed

Recently I got a really strange problem. The storm cluster have 3 machines. The topology structure is like this, Kafka Spout A -> Bolt B -> Bolt C. I have acked all tuples in every bolt, even though there possibly throw exceptions inner bolt (in bolt execute method I try and catch all exceptions, and finally ack the tuple). But here the strange thing happens. I print the log of the spout, on one machine all the tuples acked by the spout, but on other 2 machines, almost all tuples failed. And after 60 seconds the tuple replayed once again and again and again. 'Almost' means at the begin time, all tuples failed on the other 2 machines. After a time, there's a small amount of tuples acked on the 2 machines.

Absolutely the tuples are failed because of timeout. But I really don't know why they timed out. According to the logs I've printed, I'm really sure all tuples acked at the end of the execute method in every bolt. So I want to know why some of the tuples failed on the 2 machines.

Is there any thing I can do to find out what's wrong with the topology or the storm cluster? Really thanks and hoping for your reply.

Upvotes: 0

Views: 285

Answers (1)

Daniccan
Daniccan

Reputation: 2795

Your problem is related to the handling of backpressure by KafkaSpout in the StormTopology.

You can handle the back pressure of the KafkaSpout by setting the maxSpoutPending value in the topology configuration,

Config config = new Config();
config.setMaxSpoutPending(200); 
config.setMessageTimeoutSecs(100);

StormSubmitter.submitTopology("testtopology", config, builder.createTopology());

maxSpoutPending is the number of tuples that can be pending acknowledgement in your topology at a given time. Setting this property, will intimate the KafkaSpout not to consume any more data from Kafka unless the unacknowledged tuple count is less than maxSpoutPending value.

Also, make sure you can fine tune your Bolts to be lightweight as possible so that the tuples get acknowledged before they timeout.

Upvotes: 1

Related Questions