user1813228
user1813228

Reputation:

Astyanax client maximum connections per node?

I am reading the data from Cassandra database using the Astyanax client.

I have around one million unique rows in a Cassandra database. I have a single cross colocation centre cluster with four nodes.

These are my four nodes:

  node1:9160
  node2:9160
  node3:9160
  node4:9160

I have KeyCaching enabled and SizeTieredCompaction strategy is enabled as well.

I have a client program which is multithreaded that will read the data from the Cassandra database using the Astyanax client and which I am running with 20 threads. If I am running my client program with 20 threads, then the performance of reading the data from Cassandra database degrades.

So the first thing that jumps to my mind is that there might be contention over connections to Cassandra (do they use a pool, if so how many connections are being maintained)? I am using the below code to make the connection using Astyanax client.

private CassandraAstyanaxConnection() {
    context = new AstyanaxContext.Builder()
    .forCluster(ModelConstants.CLUSTER)
    .forKeyspace(ModelConstants.KEYSPACE)
    .withAstyanaxConfiguration(new AstyanaxConfigurationImpl()
        .setDiscoveryType(NodeDiscoveryType.RING_DESCRIBE)
    )
    .withConnectionPoolConfiguration(new ConnectionPoolConfigurationImpl("MyConnectionPool")
        .setPort(9160)
        .setMaxConnsPerHost(1)
        .setSeeds("nod1:9160,node2:9160,node3:9160,node4:9160")
    )
    .withAstyanaxConfiguration(new AstyanaxConfigurationImpl()
        .setCqlVersion("3.0.0")
        .setTargetCassandraVersion("1.2"))
    .withConnectionPoolMonitor(new CountingConnectionPoolMonitor())
    .buildKeyspace(ThriftFamilyFactory.getInstance());

    context.start();
    keyspace = context.getEntity();

    emp_cf = ColumnFamily.newColumnFamily(
        ModelConstants.COLUMN_FAMILY,
        StringSerializer.get(),
        StringSerializer.get());
}

Do I need to make any sort of changes in the above code to improve the performance?

What does this method do?

   setMaxConnsPerHost(1)

Do I need to increase that to improve the performance? I have four nodes, so I should change that to 4?

And will the setMaxConns(20) method call? Do I need to add that as well to improve the performance? As I will be running my program with multiple threads.

Upvotes: 5

Views: 2867

Answers (1)

Wildfire
Wildfire

Reputation: 6418

For details on maxConnsPerHost/maxConns You may check this answer: setMaxConns and setMaxConnsPerHost in Astyanax client

And yes, maxConnsPerHost should be increased to achieve good performance. The optimal value depends on network topology, request replication factor, storage configuration, caching, read/write ratio, etc.

I don't think it's possible to achieve optimal performance for heavily loaded cluster without experiments and simulations.

For tasks with moderate load on Cassandra I usually use a rule of thumb:

maxConnsPerHost ~= <Number of cores per host>/<Replication factor> + 1

That is, for a cluster of 8-core boxes with replication factor 3, maxConnsPerHost should be around 4. This value is also a good starting point for experiments in heavy-load scenarios.

The motivation: a cluster of N nodes each having C cores has N * C cores total. To process request with replication factor R, R cores (of different nodes) are required. So, at every given moment the cluster can process up to N * C / R requests. It's a good idea to keep the amount of concurrent connections around this number. Divide it by N to calculate the number of connections per host. Add 1 spare connection per host for network latencies, etc. That's it.

Update: Simple client performance tuning:

  • Start with some maxConnsPerHost value
  • Simulate load and observe CPU usage and org.apache.cassandra.request->***Stage->pendingTasks JXM attributes
  • Increase maxConnsPerHost until pendingTasks starts to increase rapidly. This is probably the optimal value.
  • CPU load on cluster nodes should be around 50-70%. If it's much less - there's probably something wrong with server configuration.

Upvotes: 9

Related Questions