ab11
ab11

Reputation: 20090

Apache Storm: what happens to a tuple when no bolts are available to consume it?

If it's linked to another bolt, but no instances of the next bolt are available for a while. How long will it hang around? Indefinitely? Long enough?

How about if many tuples are waiting, because there is a line or queue for the next available bolt. Will they merge? Will bad things happen if too many get backed up?

Upvotes: 1

Views: 1253

Answers (3)

taegeonum
taegeonum

Reputation: 81

Storm just drops them if the tuples are not consumed until timeout. (default is 30 seconds)

After that, Storm calls fail(Object msgId) method of Spout. If you want to replay the failed tuples, you should implement this function. You need to keep the tuples in memory, or other reliable storage systems, such as Kafka, to replay them.

If you do not implement the fail(Object msgId) method, Storm just drops them.

Reference: https://storm.apache.org/documentation/Guaranteeing-message-processing.html

Upvotes: 1

SQL.injection
SQL.injection

Reputation: 2647

  1. By default tuples will timeout after 30 seconds after being emitted; You can change this value, but unless you know what you are doing don't do it (topology.message.timeout.secs)
  2. Failed and timeout out tuples will be replayed by the spout, if the spout is reading from a reliable data source (eg. kafka); this is, storm has guaranteed message processing. If you are codding your own spouts, you might want to dig deep into this.
  3. You can see if you are having timeout tuples on storm UI, when tuples are failing on the spout but not on the bolts.
  4. You don't want tuples to timeout inside your topology (for example there is a performance penalty on kafka for not reading sequential). You should adjust the capacity of your topology process tuples (this is, tweak the bolt parallelism by changing the number of executors) and by setting the parameter topology.max.spout.pending to a reasonable conservative value.
  5. increase the topology.message.timeout.secs parameter is no real solution, because soon or late if the capacity of your topology is not enough the tuples will start to fail.
  6. topology.max.spout.pending is the max number of tuples that can be waiting. The spout will emit more tuples as long the number of tuples not fully processed is less than the given value. Note that the parameter topology.max.spout.pending is per spout (each spout has it's internal counter and keeps track of the tuples which are not fully processed).

Upvotes: 2

Mathew Lin
Mathew Lin

Reputation: 41

There is a deserialize-queue for buffering the coming tuples, if it hangs long enough, the queue will be full,and tuples will be lost if you don't use the ack function to make sure it will be resent.

Upvotes: 1

Related Questions