Reputation: 586
I'm using this tool from yahoo to run some performance tests on my storm cluster - https://github.com/yahoo/storm-perf-test
I notice that there's almost a 10x performance hit I get when I turn acking on. Here's some details to reproduce the test - Cluster - 3 supervisor nodes and 1 nimbus node. Each node is a c3.large.
With acking -
bin/storm jar storm_perf_test-1.0.0-SNAPSHOT-jar-with-dependencies.jar com.yahoo.storm.perftest.Main --ack --boltParallel 60 --maxSpoutPending 100 --messageSizeByte 100 --name some-topo --numWorkers 9 --spoutParallel 20 --testTimeSec 100 --pollFreqSec 20 --numLevels 2
status topologies totalSlots slotsUsed totalExecutors executorsWithMetrics time time-diff ms transferred throughput (MB/s)
WAITING 1 3 0 141 0 1424707134585 0 0 0.0
WAITING 1 3 3 141 141 1424707154585 20000 24660 0.11758804321289062
WAITING 1 3 3 141 141 1424707174585 20000 17320 0.08258819580078125
RUNNING 1 3 3 141 141 1424707194585 20000 13880 0.06618499755859375
RUNNING 1 3 3 141 141 1424707214585 20000 21720 0.10356903076171875
RUNNING 1 3 3 141 141 1424707234585 20000 43220 0.20608901977539062
RUNNING 1 3 3 141 141 1424707254585 20000 35520 0.16937255859375
RUNNING 1 3 3 141 141 1424707274585 20000 33820 0.16126632690429688
Without acking -
bin/storm jar ~/target/storm_perf_test-1.0.0-SNAPSHOT-jar-with-dependencies.jar com.yahoo.storm.perftest.Main --boltParallel 60 --maxSpoutPending 100 --messageSizeByte 100 --name some-topo --numWorkers 9 --spoutParallel 20 --testTimeSec 100 --pollFreqSec 20 --numLevels 2
status topologies totalSlots slotsUsed totalExecutors executorsWithMetrics time time-diff ms transferred throughput (MB/s)
WAITING 1 3 0 140 0 1424707374386 0 0 0.0
WAITING 1 3 3 140 140 1424707394386 20000 565460 2.6963233947753906
WAITING 1 3 3 140 140 1424707414386 20000 1530680 7.298851013183594
RUNNING 1 3 3 140 140 1424707434386 20000 3280760 15.643882751464844
RUNNING 1 3 3 140 140 1424707454386 20000 3308000 15.773773193359375
RUNNING 1 3 3 140 140 1424707474386 20000 4367260 20.824718475341797
RUNNING 1 3 3 140 140 1424707494386 20000 4489000 21.40522003173828
RUNNING 1 3 3 140 140 1424707514386 20000 5058960 24.123001098632812
The last 2 columns are the ones that are really important. It shows the number of tuples transferred and the rate in MBps.
Is this kind of performance hit expected with storm when we turn on acking? I'm using version 0.9.3 and no advanced networking.
Upvotes: 2
Views: 1257
Reputation: 1141
There is always going to be a certain degree of performance degradation with acking enabled -- it's the price you pay for reliability. Throughput will ALWAYS be higher with acking disabled, but you have no guarantee if your data is processed or dropped on the floor. Whether that's a 10x hit like you're seeing, or significantly less, is a matter of tuning.
One important setting is topology.max.spout.pending
, which allows you to throttle spouts so that only that many tuples are allowed "in flight" at any given time. That setting is useful for making sure downstream bolts don't get overwhelmed and start timing out tuples.
That setting also has no effect with acking disabled -- it's like opening the flood gates and dropping any data that overflows. So again, it will always be faster.
With acking enabled, Storm will make sure everything gets processed at least once, but you need to tune topology.max.spout.pending
appropriately for your use case. Since every use case is different, this is a matter of trial and error. Set it too low, and you will have low throughput. Set it too high and your downstream bolts will get overwhelmed, tuples will time out, and you will get replays.
To illustrate, set maxSpoutPending
to 1 and run the benchmark again. Then try 1000.
So yes, a 10x performance hit is possible without proper tuning. If data loss is okay for your use case, turn acking off. But if you need reliable processing, turn it on, tune for your use case, and scale horizontally (add more nodes) to reach your throughput requirements.
Upvotes: 4