Vishal Sharma
Vishal Sharma

Reputation: 1750

Scaling in Cassandra

I tested throughput performance of Cassandra cluster with 2,3 and 4 nodes. There was significant improvement when I used 3 nodes(as compared to 2), however, the improvement wasn't so significant when I used 4 nodes, instead of 3.

Given below are specs of the 4 nodes:

N->No. of physical CPU cores, Ra->Total RAM, Rf->Free RAM

Node 1: N=16, Ra=189 GB, Rf=165 GB
Node 2: N=16, Ra=62 GB, Rf=44 GB
Node 3: N=12, Ra=24 GB, Rf=38 GB
Node 4: N=16, Ra=189 GB, Rf=24 GB

All nodes are on RHEL 6.5

Case 1(2 nodes in the cluster, Node 1 and Node 2)

Throughput: 12K ops/second

Case 2(3 nodes in the cluster, Node 1, Node 2 and Node 3)

Throughput: 20K ops/second

Case 3(All 4 nodes in the cluster)

Throughput: 23K ops/second

1 operation involved 1 read + 1 write(Read/write takes place on the same row)(Row cache can't be used). In all cases, Read consistency =2 and Write Consistency =1. Both read and write were asynchronous. The client application used Datastax's C++ driver and was being run with 10 threads.

Given below are the keyspace and table details:

CREATE KEYSPACE cass WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': '2'}  AND durable_writes = true;

CREATE TABLE cass.test_table (
    pk text PRIMARY KEY,
    data1_upd int,
    id1 int,
    portid blob,
    im text,
    isflag int,
    ms text,
    data2 int,
    rtdata blob,
    rtdynamic blob,
    rtloc blob,
    rttdd blob,
    rtaddress blob,
    status int,
    time bigint
) WITH bloom_filter_fp_chance = 0.001
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

Some parameters from YAML are given below(All 4 nodes used similar YAML files):

commitlog_segment_size_in_mb: 32
concurrent_reads: 64
concurrent_writes: 256
concurrent_counter_writes: 32
memtable_offheap_space_in_mb: 20480
memtable_allocation_type: offheap_objects
memtable_flush_writers: 1
concurrent_compactors: 2

Some parameters from jvm.options are given below(all nodes used same values):

### CMS Settings
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=4
-XX:MaxTenuringThreshold=6
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSWaitDuration=10000
-XX:+CMSParallelInitialMarkEnabled
-XX:+CMSEdenChunksRecordAlways
-XX:+CMSClassUnloadingEnabled

Given below are some client's connection specific parameters:

cass_cluster_set_max_connections_per_host ( ms_cluster, 20 );
cass_cluster_set_queue_size_io ( ms_cluster, 102400*1024 );
cass_cluster_set_pending_requests_low_water_mark(ms_cluster, 50000);
cass_cluster_set_pending_requests_high_water_mark(ms_cluster, 100000);
cass_cluster_set_write_bytes_low_water_mark(ms_cluster, 100000 * 2024);
cass_cluster_set_write_bytes_high_water_mark(ms_cluster, 100000 * 2024);
cass_cluster_set_max_requests_per_flush(ms_cluster, 10000);
cass_cluster_set_request_timeout ( ms_cluster, 12000 );
cass_cluster_set_connect_timeout (ms_cluster, 60000);
cass_cluster_set_core_connections_per_host(ms_cluster,1);
cass_cluster_set_num_threads_io(ms_cluster,10);
cass_cluster_set_connection_heartbeat_interval(ms_cluster, 60);
cass_cluster_set_connection_idle_timeout(ms_cluster, 120);

Is there anything wrong with the configurations due to which Cassandra didn't scale much when number of nodes were increased from 3 to 4?

Upvotes: 0

Views: 762

Answers (1)

Leleu Eric
Leleu Eric

Reputation: 31

During a test, you may check ThreadPools using nodetool tpstats. You will be able to see if some stages have too many pending (or blocked) tasks.

If there are no issues with ThreadPools, may be you cloud launch a benchmark using cassandra-stress in order to see if the limitation comes from your client?

I don't know if it is only for test purpose but as far as I know, Read before Write data is an antipattern with Cassandra.

Upvotes: 1

Related Questions