Jay
Jay

Reputation: 607

Cassandra Light weight transaction performance penalty

I have two cassandra table, a record table and a counter table. The counter table keeps a counter for each kind of record in the record table.

When I insert a new record into the record table, I will update the counter table at the same time. But it's possible that the new record is already in the record table. It's Ok to insert the same record twice, but then I would add the counter twice, which is not correct.

I have two solutions now.

  1. Fetch a record from cassandra with the new record key. If it's not null, I will not insert the record and increase the counter.

  2. Use light weight transaction to let cassandra check if the record already exists.

Solution 2 will make the insert "atomic", but the doc says it will have performance penalties. In Solution 1, I'm sending 2 queries, this will also have performance penalties.

Currently I'm using solution 1. I'm new to cassandra light weight transaction, so I don't know the cost of atomicity. Does anyone know which solution is better?

Upvotes: 3

Views: 2257

Answers (1)

yurgis
yurgis

Reputation: 4067

Basically you have a few options:

  1. Insert "trusted" unique - you "somehow"" know ahead that any items you insert do not exist in the table, so you just insert and increment the counter without checking anything
  2. Insert with a light-weight-transaction - use IF NOT EXISTS (will almost guarantee a consistency except when increment counter fails / times out - in this rare case you may run into under- or over-counting). This option allows concurrent clients.
  3. Read/write with consistency level ONE (if you run frequent inserts you may run into consistency issues between write and read; also make sure there are NO concurrent clients doing the same thing)
  4. Read/write with consistency level QUORUM (a read will be consistent with a last write however, you still have to make sure that there are NO concurrent clients.

One day I was running a simple test against a 3x Cassandra cluster of m3.large instances (https://aws.amazon.com/ec2/instance-types/) There were 100 partitions and 100 inserts into each partition (so total 10k inserts) in a single thread - so this is not an IO-saturating test.

The schema:

CREATE TABLE IF NOT EXISTS parent_children (
  parentId uuid,
  childId uuid, 
  PRIMARY KEY (parentId, childId)
);

CREATE TABLE IF NOT EXISTS child_counters (
  parentId uuid,
  count counter, 
  PRIMARY KEY (parentId)
);

The results:

Insertion Method    Latency per insert, ms
TRUSTED UNIQUE      1.6404
IF NOT EXISTS       4.2801
READ WRITE ONE      3.9382
READ WRITE QUORUM   3.7714

Note that quorum was unexpectedly little faster but that probably was within an error margin and/or may be due to specifics of the cluster topology.

Upvotes: 7

Related Questions