Vivek Rao
Vivek Rao

Reputation: 586

Storm topology performance hit when acking

I'm using this tool from yahoo to run some performance tests on my storm cluster - https://github.com/yahoo/storm-perf-test

I notice that there's almost a 10x performance hit I get when I turn acking on. Here's some details to reproduce the test - Cluster - 3 supervisor nodes and 1 nimbus node. Each node is a c3.large.

With acking -

bin/storm jar storm_perf_test-1.0.0-SNAPSHOT-jar-with-dependencies.jar com.yahoo.storm.perftest.Main --ack --boltParallel 60 --maxSpoutPending 100 --messageSizeByte 100 --name some-topo --numWorkers 9 --spoutParallel 20 --testTimeSec 100 --pollFreqSec 20 --numLevels 2

status  topologies  totalSlots  slotsUsed   totalExecutors  executorsWithMetrics    time    time-diff ms    transferred throughput (MB/s)
WAITING 1   3   0   141 0   1424707134585   0   0   0.0
WAITING 1   3   3   141 141 1424707154585   20000   24660   0.11758804321289062
WAITING 1   3   3   141 141 1424707174585   20000   17320   0.08258819580078125
RUNNING 1   3   3   141 141 1424707194585   20000   13880   0.06618499755859375
RUNNING 1   3   3   141 141 1424707214585   20000   21720   0.10356903076171875
RUNNING 1   3   3   141 141 1424707234585   20000   43220   0.20608901977539062
RUNNING 1   3   3   141 141 1424707254585   20000   35520   0.16937255859375
RUNNING 1   3   3   141 141 1424707274585   20000   33820   0.16126632690429688

Without acking -

bin/storm jar ~/target/storm_perf_test-1.0.0-SNAPSHOT-jar-with-dependencies.jar com.yahoo.storm.perftest.Main --boltParallel 60 --maxSpoutPending 100 --messageSizeByte 100 --name some-topo --numWorkers 9 --spoutParallel 20 --testTimeSec 100 --pollFreqSec 20 --numLevels 2

status  topologies  totalSlots  slotsUsed   totalExecutors  executorsWithMetrics    time    time-diff ms    transferred throughput (MB/s)
WAITING 1   3   0   140 0   1424707374386   0   0   0.0
WAITING 1   3   3   140 140 1424707394386   20000   565460  2.6963233947753906
WAITING 1   3   3   140 140 1424707414386   20000   1530680 7.298851013183594
RUNNING 1   3   3   140 140 1424707434386   20000   3280760 15.643882751464844
RUNNING 1   3   3   140 140 1424707454386   20000   3308000 15.773773193359375
RUNNING 1   3   3   140 140 1424707474386   20000   4367260 20.824718475341797
RUNNING 1   3   3   140 140 1424707494386   20000   4489000 21.40522003173828
RUNNING 1   3   3   140 140 1424707514386   20000   5058960 24.123001098632812

The last 2 columns are the ones that are really important. It shows the number of tuples transferred and the rate in MBps.

Is this kind of performance hit expected with storm when we turn on acking? I'm using version 0.9.3 and no advanced networking.

Upvotes: 2

Views: 1257

Answers (1)

P. Taylor Goetz
P. Taylor Goetz

Reputation: 1141

There is always going to be a certain degree of performance degradation with acking enabled -- it's the price you pay for reliability. Throughput will ALWAYS be higher with acking disabled, but you have no guarantee if your data is processed or dropped on the floor. Whether that's a 10x hit like you're seeing, or significantly less, is a matter of tuning.

One important setting is topology.max.spout.pending, which allows you to throttle spouts so that only that many tuples are allowed "in flight" at any given time. That setting is useful for making sure downstream bolts don't get overwhelmed and start timing out tuples.

That setting also has no effect with acking disabled -- it's like opening the flood gates and dropping any data that overflows. So again, it will always be faster.

With acking enabled, Storm will make sure everything gets processed at least once, but you need to tune topology.max.spout.pending appropriately for your use case. Since every use case is different, this is a matter of trial and error. Set it too low, and you will have low throughput. Set it too high and your downstream bolts will get overwhelmed, tuples will time out, and you will get replays.

To illustrate, set maxSpoutPending to 1 and run the benchmark again. Then try 1000.

So yes, a 10x performance hit is possible without proper tuning. If data loss is okay for your use case, turn acking off. But if you need reliable processing, turn it on, tune for your use case, and scale horizontally (add more nodes) to reach your throughput requirements.

Upvotes: 4

Related Questions