Reputation: 73
I am playing with Apache Storm for a real-time image processing application which requires ultra low latency. In the topology definition, a single spout will emit raw images(5MB) in every 1s and a few bolts will process them. The processing latency of each bolt is acceptable and the overall computing delay can be around 150ms.
However, I find that the message passing delay between workers on the different nodes is really high. The overall such delay on the 5 successive bolts is around 200ms. To calculate this delay, I subtract all the task latencies from the end-to-end latency. Moreover, I implement a timer bolt and other processing bolts will register in this timer bolt to record the timestamp before starting the real processing. By comparing the timestamps of the bolts, I find the delay between each bolt is high as I previously noticed.
To analyze the source of this high additional delay, I firstly reduce the sending interval to 1s and thus there should be no queuing delay due to the high computing overheads. Also, from the Storm UI, I find none bolt is in high CPU utilization.
Then, I checked the network delay. I am using a 1Gbps network testbed and test the network by RTT and bandwidth. The network latency should not be that high to send a 5MB image.
Finally, I am thinking about the buffer delay. I find each thread maintains its own sending buffer and transfer the data to the worker's sending buffer. I am not sure how long it takes before the receiver bolt can get this sending message. As suggested by the community, I increase the sender/receiver buffer size to 16384, modify STORM_NETTY_MESSAGE_BATCH_SIZE to 32768. However, it did not help.
My question is that how to remove/reduce the messaging overheads between bolts?(inter workers) It is possible to synchronize the communication between bolts and have the receiver got the sending messages immediately without any delay?
Upvotes: 0
Views: 875
Reputation: 16177
Based on your comment above, you are including roughly 5MB images in each message.
I don't know about kafka/storm in great detail, but my understanding is that it is a mainstream message broker. Such systems are not designed to deal with large payloads, primarily due to the guarantees they provide regarding delivery and persistence, both of which requires certain processing steps that buffer the byte stream, multiple times in most cases. This causes you to have a greater than linear time growth in your latency as size increases.
My recommendation would be to store your images in something fast like Couchbase or Memcached, then send a message containing a pointer to it. Such a setup would not be difficult to get up and running in under a day.
Upvotes: 0
Reputation: 73
Through the detailed benchmark by inserting timestamps to Storm's source code, I find the step "Serialization" takes up to 30ms when passing two 1440x1080 images. If I purely pass a byte array into a tuple, I think this step can be removed and thus cut down the latency...
Upvotes: 0
Reputation: 368
For low latencies you may need to tune the netty buffer and transfer batch sizes. Some of this delay could be inherent due to the messaging and threading model of the current worker.
Also try adjusting the disruptor configs:
That said the community is making effort to improve the latencies and throughput by redesigning the messaging subsystem. See https://github.com/apache/storm/pull/2502
Upvotes: 0