I am using GridGain Ignite as an in-memory database to serve data, data is stored as <Integer, String> key-value format, and each request contains a list of keys, I use cache.getAll(keys).values() to get the list of all values from ignite cluster. The server configuration: ignite version: GridGain Community 8.8.34 ignite cluster: 4 nodes, each with 14 cpu and 100 Gi cache: replicated on each node, 50Gi for each replication my server: ignite thick client, in total 10 pods, each with 6 cpu and 10Gi memory. The test configuration: the size of keys per request is 700 the response body is 850kb 5000 users, 10 spwan rate. I wish my application could support 50k RPS, however, for current configs it can only support 2k RPS, and the response time will increase as the RPS increase. (it is tested that for my server application, it can support >100k RPS if there's no interaction with ignite, so I think the bottle neck should be on ignite side). THe following is the test metrics using locust. The cpu and memory usage on my server application and ignite cluster is less than 30%, I wonder if there's any task queue on ignite that will block the cache get operation.

performanceoptimizationignitegridgainthroughput

Reputation: 119

GridGain Throughput bottleneck using key value API

I am using GridGain Ignite as an in-memory database to serve data, data is stored as <Integer, String> key-value format, and each request contains a list of keys, I use cache.getAll(keys).values() to get the list of all values from ignite cluster.

The server configuration:

ignite version: GridGain Community 8.8.34
ignite cluster: 4 nodes, each with 14 cpu and 100 Gi
cache: replicated on each node, 50Gi for each replication
my server: ignite thick client, in total 10 pods, each with 6 cpu and 10Gi memory.

The test configuration:

the size of keys per request is 700
the response body is 850kb
5000 users, 10 spwan rate.

I wish my application could support 50k RPS, however, for current configs it can only support 2k RPS, and the response time will increase as the RPS increase. (it is tested that for my server application, it can support >100k RPS if there's no interaction with ignite, so I think the bottle neck should be on ignite side). THe following is the test metrics using locust.

The cpu and memory usage on my server application and ignite cluster is less than 30%, I wonder if there's any task queue on ignite that will block the cache get operation.

Upvotes: 0

Answers (2)

Vincent Y

Reputation: 119

some updates:

I tested for both partitioned mode and replicated mode, for both 4 sevrer nodes (same config), partitioned mode is slightly better than replicated mode, both around 2k-2.5k RPS.

And I notice that no matter how I tune the ignite configurations (like thread pool size), the cpu usage is at most 4-5 cpu (even if I give more cpu for each server node and client node). Because of this, I feel like more number of server nodes can have larger performance gain compared to cache mode and backups number.

One of the work around is to "flatten" cluster, horizontally scale the nodes, but each node with less resource. However, this is kinda brute force, and I'm not sure if this cpu usage can be optimized for ignite.

Upvotes: 0

user21160483

Reputation: 149

I would have to ask, why are you using a replicated cache? If you were using a partitioned cache then 1/4 of your keys would statistically reside on each host. With this distribution pattern each host (with its associated network interface) would return ~1/4 of your overall data set. With a replicated cache each host can return all requested keys and as a result the cluster does more network related work! Generally you want to use a replicated cache if your have a small table that is used in joins often. Partitioned caches are recommended for large tables such as this one! Hope that helps.

Upvotes: 0

GridGain Throughput bottleneck using key value API

Answers (2)

Related Questions