Reputation: 457
I was testing enqueue and dequeue rate of redis over the network which has 1Gbps LAN speed, and both the machines has 1Gbps ethernet card. Redis version:3.2.11
lpush 1L items having 1 byte per item using python client. Dequeuing items using rpop took around 55 secs over the network which is just 1800 dequeues sec. Whereas the same operation completes within 5 secs which I dequeue from local which is around 20,000 dequeues sec.
Enqueue rates are almost close to dequeue rate.
This is done using office network when no much usage are there. The same is observed on production environments too!
A drop of less than 3x over the network is accepted. Around 10x looks like I am doing something wrong.
Please suggest if I need to make any configuration changes on server or client side.
Thanks in Advance.
Upvotes: 0
Views: 623
Reputation: 1400
Retroactively replying in case anyone else discovers this question.
Round-trip latency and concurrency are likely your bottlenecks here. If all of the dequeue calls are in serial, then you are stacking that network latency. With 1 million calls at 2ms latency, you'd have at least 2 million ms of latency overhead, or 33 mins). This is to say that your application is waiting for the server to receive the payload, do something, and reply to acknowledge the operation was successful. Some redis clients also perform multiple calls to enqueue / dequeue a single job (pop & ack/del), potentially doubling that number.
The following link illustrates different approaches for using redis keys by different libraries (ruby's resque vs. clojure's carmine, pay note to the use of multiple redis commands that are executed on the redis server for a single message). This is likely the cause of the 10x vs. 3x performance you were expecting.
https://kirshatrov.com/2018/07/20/redis-job-queue/
An oversimplified example of two calls per msg dequeue (latency of 1ms and redis server operations take 1 ms):
|client | server
~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1ms | pop msg >--(1ms)--> receive pop request
2ms | [process request (1ms)]
3ms | receive msg <--(1ms)--< send msg to client
4ms | send del >--(1ms)--> receive del
5ms | [delete msg from queue (1ms)]
6ms | receive ack <--(1ms)--< reply with delete ack
Improving dequeue times often involves using a client that supports multi-threaded or multi-process concurrency (i.e. 10 concurrent workers would significantly reduce the overall time to completion). This ensures your network is better utilized by sending a stream of dequeue requests, instead of waiting for one request to complete before grabbing the next one.
As for 1 byte vs 500 bytes, the default TCP MTU is 1500 bytes. Subtracting TCP headers, the payload is ~ 1460 bytes (less if tunneling with GRE/IPsec, more if using jumbo frames). Since both payload sizes would fit in a single TCP packet, they will have similar performance characteristics.
A 1gbps ethernet interface can deliver anywhere between 81,274 and 1,488,096 packets per second (depending on payload size).
So really, it's a question of how many processes & threads you can run concurrently on the client to keep the network & redis server busy.
Upvotes: 1
Reputation: 39568
Redis is generally I/O bound, not CPU bound. It may be hitting network bandwidth limits. Given the small size of your messages most of the bandwidth may be eaten by TCP overhead.
On a local machine you are bound by memory bandwidth, which is much faster than your 1Gbps network bandwidth. You can likely increase network throughput by increasing the amount of data you grab at a time.
Upvotes: 0