dmc
dmc

Reputation: 411

Scalability of Redis Cluster using Jedis 2.8.0 to benchmark throughput

I have an instance of JedisCluster shared between N threads that perform set operations.

When I run with 64 threads, the throughput of set operations is only slightly increased (compared to running using 8 threads).

How to configure the JedisCluster instance using the GenericObjectPoolConfig so that I can maximize throughput as I increase the thread count?

I have tried

 GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig();
 poolConfig.setMaxTotal(64);

 jedisCluster = new JedisCluster(jedisClusterNodes, poolConfig);

believing this could increase the number of jedisCluster connection to the cluster and so boost throughput.

However, I observed a minimal effect.

Upvotes: 1

Views: 976

Answers (1)

mp911de
mp911de

Reputation: 18119

When talking about performance, we need to dig into details a bit before I can actually answer your question.

A naive approach suggests: The more Threads (concurrency), the higher the throughput.

My statement is not wrong, but it is also not true. Concurrency and the resulting performance are not (always) linear because there is so many involved behind the scenes. Turning something from sequential to concurrent processing might result in something that executes twice of the work compared to sequential execution. This example assumes that you run a multi-core machine, that is not occupied by anything else and it has enough bandwidth for the required work processing (I/O, Network, Memory). If you scale this example from two threads to eight, but your machine has only four physical cores, weird things might happen.

First of all, the processor needs to schedule two threads so each of the threads probably behaves as if they would run sequentially, except that the process, the OS, and the processor have increased overhead caused by twice as many threads as cores. Orchestrating these guys comes at a cost that needs to be paid at least in memory allocation and CPU time. If the workload requires heavy I/O, then the work processing might be limited by your I/O bandwidth and running things concurrently may increase throughput as the CPU is mostly waiting until the I/O comes back with the data to process. In that scenario, 4 threads might be blocked by I/O while the other 4 threads are doing some work. Similar applies to memory and other resources utilized by your application. Actually, there's much more that digs into context switching, branch prediction, L1/L2/L3 caching, locking and much more that is enough to write a 500-page book. Let's stay at a basic level.

Resource sharing and certain limitations lead to different scalability profiles. Some are linear until a certain concurrency level, some hit a roof and adding more concurrency results in the same throughput, some have a knee when adding concurrency makes it even slower because of $reasons.

Effects on scalability

Now, we can analyze how Redis, Redis Cluster, and concurrency are related.

  • Redis is a network service which requires network I/O. Networking might be obvious, but we require to add this fact to our considerations meaning a Redis server shares its network connection with other things running on the same host and things that use the switches, routers, hubs, etc. Same applies to the client, even in the case you told everybody else not to run anything while you're testing.

  • The next thing is, Redis uses a single-threaded processing model for user tasks (Don't want to dig into Background I/O, lazy-free memory freeing and asynchronous replication). So you could assume that Redis uses one CPU core for its work but, in fact, it can use more than that. If multiple clients send commands at a time, Redis processes commands sequentially, in the order of arrival (except for blocking operations, but let's leave this out for this post). If you run N Redis instances on one machine where N is also the number of CPU cores, you can easily run again into a sharing scenario - That is something you might want to avoid.

  • You have one or many clients that talk to your Redis server(s). Depending on the number of clients involved in your test, this has an effect. Running 64 threads on a 8 core machine might be not the best idea since only 8 cores can execute work at a time (let's leave hyper-threading and all that out of here, don't want to confuse you too much). Requesting more than 8 threads causes time-sharing effects. Running a bit more threads than CPU cores for Redis and other networked services isn't a too bad of an idea since there is always some overhead/lag coming from the I/O (network). You need to send packets from Java (through the JVM, the OS, the network adapter, routers) to Redis (routers, network, yadda yadda yadda), Redis has to process the commands and send the response back. This usually takes some time.

  • The client itself (assuming concurrency on one JVM) locks certain resources for synchronization. Especially requesting new connections with using existing/creating new connections is a scenario for locking. You already found a link to the Pool config. While one thread locks a resource, no other thread can access the resource.

Knowing the basics, we can dig into how to measure throughput using jedis and Redis Cluster:

  • Congestion on Redis Cluster can be an issue. If all client threads are talking to the same cluster node, then other cluster nodes are idle, and you effectively measured how one node behaves but not the cluster: Solution: Create an even workload (Level: Hard!)

  • Congestion on the Client: Running 64 threads on a 8 core machine (that is just my assumption here, so please don't beat me up if I'm wrong) is not the best idea. Raising the number of threads on a client a bit above the number of Cluster nodes (assuming even workload for each cluster node) and a bit over the number of CPU cores can improve performance is never a too bad idea. Having 8x threads (compared to the number of CPU cores) is an overkill because it adds scheduling overhead at all levels. In general, performance engineering is related to finding the best ratio between work, overhead, bandwidth limitations and concurrency. So finding the best number of threads is an own field in computer science.

  • If running a test using multiple systems, that run a number of total threads, is something that might be closer to a production environment than running a test from one system. Distributed performance testing is a master class (Level: Very hard!) The trick here is to monitor all resources that are used by your test making sure nothing is overloaded or finding the tipping point where you identify the limit of a particular resource. Monitoring the client and the server are just the easy parts.

Since I do not know your setup (number of Redis Cluster nodes, distribution of Cluster nodes amongst different servers, load on the Redis servers, the client, and the network during test caused by other things than your test), it is impossible to say what's the cause.

Upvotes: 1

Related Questions