Vincent Y
Vincent Y

Reputation: 119

GridGain Throughput bottleneck using key value API

I am using GridGain Ignite as an in-memory database to serve data, data is stored as <Integer, String> key-value format, and each request contains a list of keys, I use cache.getAll(keys).values() to get the list of all values from ignite cluster.

The server configuration:

The test configuration:

I wish my application could support 50k RPS, however, for current configs it can only support 2k RPS, and the response time will increase as the RPS increase. (it is tested that for my server application, it can support >100k RPS if there's no interaction with ignite, so I think the bottle neck should be on ignite side). THe following is the test metrics using locust.

enter image description here

The cpu and memory usage on my server application and ignite cluster is less than 30%, I wonder if there's any task queue on ignite that will block the cache get operation.

Upvotes: 0

Views: 132

Answers (2)

Vincent Y
Vincent Y

Reputation: 119

some updates:

I tested for both partitioned mode and replicated mode, for both 4 sevrer nodes (same config), partitioned mode is slightly better than replicated mode, both around 2k-2.5k RPS.

And I notice that no matter how I tune the ignite configurations (like thread pool size), the cpu usage is at most 4-5 cpu (even if I give more cpu for each server node and client node). Because of this, I feel like more number of server nodes can have larger performance gain compared to cache mode and backups number.

One of the work around is to "flatten" cluster, horizontally scale the nodes, but each node with less resource. However, this is kinda brute force, and I'm not sure if this cpu usage can be optimized for ignite.

Upvotes: 0

user21160483
user21160483

Reputation: 149

I would have to ask, why are you using a replicated cache? If you were using a partitioned cache then 1/4 of your keys would statistically reside on each host. With this distribution pattern each host (with its associated network interface) would return ~1/4 of your overall data set. With a replicated cache each host can return all requested keys and as a result the cluster does more network related work! Generally you want to use a replicated cache if your have a small table that is used in joins often. Partitioned caches are recommended for large tables such as this one! Hope that helps.

Upvotes: 0

Related Questions