ab11
ab11

Reputation: 20090

Storm, any way to log how many tuples in flight?

As part of my tuning, I've been adjusting the maxSpoutPending parameter. However, it would be nice to know how many tuples in the topology at any time, so I could tell how much of an impact this parameter is having on my topologies' performance.

I dug around in the source but didn't find anything. Is this a value I can find in the Storm UI? Or possibly I can override something somewhere to log this value?

Upvotes: 2

Views: 592

Answers (3)

Morgan Kenyon
Morgan Kenyon

Reputation: 3172

You said you're looking for insight on the effectiveness of the maxTuplesPending attribute.

Working with the KafkaSpout provided by Storm, (I've modified the source code to add more logging to see what's happening) the next() method gets called all the time (<1ms). So I've always seen relatively fast turn around (<1ms) from when a Tuple gets ack'd or failed (reducing the MaxPending count) and when a new tuple gets sent into the topology (hitting the MaxPending count again). Logs from today showing the time stamps from when a Tuple gets ack'd and then another one gets sent out.

2015-10-16T12:20:15.162-0500 s.k.PartitionManager [INFO] PM! 6 - ack
2015-10-16T12:20:15.163-0500 s.k.PartitionManager [INFO] PM! 177 - next

2015-10-16T12:20:15.400-0500 s.k.PartitionManager [INFO] PM! 10 - ack
2015-10-16T12:20:15.401-0500 s.k.PartitionManager [INFO] PM! 178 - next

2015-10-16T12:20:15.649-0500 s.k.PartitionManager [INFO] PM! 22 - ack
2015-10-16T12:20:15.649-0500 s.k.PartitionManager [INFO] PM! 180 - next

2015-10-16T12:20:16.511-0500 s.k.PartitionManager [INFO] PM! 27 - ack
2015-10-16T12:20:16.512-0500 s.k.PartitionManager [INFO] PM! 182 - next 

This shows fairly instantaneous turnaround. So for my use case there's pretty much always maxPending count number of Tuples in my Topology.

My tuples also don't get processed rather quickly (~1 sec), so for tuples that get processed much faster or for different types of Spouts I couldn't say.

Upvotes: 1

Matthias J. Sax
Matthias J. Sax

Reputation: 62330

It depends on what you mean by "how many tuples are in the topology".

  1. If you want to know how many tuples that spout emitted are not processed completely yet, you can simple take the difference of "Spout emitted" and "Spout acked" from Storm UI (you can obtain those values also via client.getTopologyInfo("topolgoyName") (with client = NimbusClient.getConfiguredClient(...).
  2. If you want to know all tuples over all stages in the topology (ie, in all buffers for each spout/bolt), it might the quite tricky... TopologyInfo might still be helpful, but I am not sure if/how to compute the value you want to know.

Upvotes: 1

SQL.injection
SQL.injection

Reputation: 2647

Given that you have enough messages in your spout you can force the spout from reading from the beginning and see how many tuples you can process in 10 minutes. (and with basic math you can obtain the number of tuples per second).

For example with a kafka spout you can do the following:

        SpoutConfig spoutConfig = new SpoutConfig(
          // your spout config
         );   
    spoutConfig.forceFromStart = true; // this is how you tell the spout to read from the oldest kafka offset
    KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);

And then let the topology run for 15 minutes and see how many tuples the topology processed in the last 10 minutes.

Upvotes: 0

Related Questions