Evgeny
Evgeny

Reputation: 21

Very slow writes in Cassandra

I'm newbie in NoSQL and Cassandara in particular. At the moment doing some benchmarking with Cassandra and experiencing very slow write throughput.

As it is said, Cassandra can perform hundreds of thousands of inserts per second, however I'm not observing this: 1) when I send 100 thousand inserts simultaneously via 8 CQL clients, then throughput is ~14470 inserts per seconds. 2) when I do the same via 8 Thrift clients, then throughput is ~16300 inserts per seconds.

I think Cassandra performance can be improved, but I don't know what to tune. Please take a look at the test conditions below and advise something. Thank you.

Tests conditions:

1. Cassandra cluster is deployed on three machines, each machine has 8 cores Intel(R) Xeon(R) CPU E5420 @ 2.50GHz, RAM is 16GB, network speed is 1000Mb/s.

2. The data sample is*

set MM[utf8('1:exc_source_algo:20100105000000.000000:ENTER:0')]['order_id'] = '1.0';
set MM[utf8('1:exc_source_algo:20100105000000.000000:ENTER:0')]['security'] = 'AA1';
set MM[utf8('1:exc_source_algo:20100105000000.000000:ENTER:0')]['price'] = '47.1';
set MM[utf8('1:exc_source_algo:20100105000000.000000:ENTER:0')]['volume'] = '300.0';
set MM[utf8('1:exc_source_algo:20100105000000.000000:ENTER:0')]['se'] = '1';
set MM[utf8('2:exc_source_algo:20100105000000.000000:ENTER:0')]['order_id'] = '2.0';
set MM[utf8('2:exc_source_algo:20100105000000.000000:ENTER:0')]['security'] = 'AA1';
set MM[utf8('2:exc_source_algo:20100105000000.000000:ENTER:0')]['price'] = '44.89';
set MM[utf8('2:exc_source_algo:20100105000000.000000:ENTER:0')]['volume'] = '310.0';
set MM[utf8('2:exc_source_algo:20100105000000.000000:ENTER:0')]['se'] = '1';
set MM[utf8('3:exc_source_algo:20100105000000.000000:ENTER:0')]['order_id'] = '3.0';
set MM[utf8('3:exc_source_algo:20100105000000.000000:ENTER:0')]['security'] = 'AA2';
set MM[utf8('3:exc_source_algo:20100105000000.000000:ENTER:0')]['price'] = '0.35';

3. Commit log is written the on local hard drive, the data is written on Lustre.

4. Keyspace description

Keyspace: MD:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
  Durable Writes: true
    Options: [datacenter1:1]
  Column Families:
    ColumnFamily: MM
      Key Validation Class: org.apache.cassandra.db.marshal.BytesType
      Default column value validator: org.apache.cassandra.db.marshal.BytesType
      Columns sorted by: org.apache.cassandra.db.marshal.BytesType
      Row cache size / save period in seconds: 0.0/0
      Key cache size / save period in seconds: 200000.0/14400
      Memtable thresholds: 2.3249999999999997/1440/496 (millions of ops/minutes/MB)
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 1.0
      Replicate on write: true
      Built indexes: []

Upvotes: 1

Views: 4265

Answers (2)

DNA
DNA

Reputation: 42617

Especially with python clients, you may see better performance by running each client as a separate process rather than thread, due to the global interpreter lock.

After that, try splitting the clients onto multiple machines if possible.

Also, make sure your clients are contacting all three nodes so the workload is evenly spread.

Writing data to Lustre, rather than local disk, might be a factor but I don't have experience with Lustre to tell.

Upvotes: 2

sbridges
sbridges

Reputation: 25150

Are you using 8 threads/processes to do writes? If each write takes 0.5 ms, then 8 threads/processes can only do 16,000 writes per second.

Upvotes: 2

Related Questions