jens
jens

Reputation: 17565

Cassandra = Atomicity/Isolation of Column-Updates on a Single Row on on single Node?

sorry having to ask again something to Cassandra again and I would very much appreciate your adivce:

I have read this: http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic and are comletely lost and wondering:

Is it really true, that in Cassandra WRITES on ONE SIGLE NODE for ONE ROW-KEY (with MANY COLUMNS to be updated in the SAME COLUMNFAMILY, using batch_mutate) are NOT ISOLATED against a READ on the SAME NODE to the SAME ROW-KEY'S COLUMNS, guaranteeing that a read does not ready "partly changed data"? Example:

Current Status:     [KEY=1 , ColumnName=A with Value=A , ColumnName=B with Value=B] on Node 1
Client A => Writes: [KEY=1 , ColumnName=A with Value=C , ColumnName=B with Value=D] on Node 1

ATOMICITY:

According to the cassandra docs, Writes are Atomic for the Client doing the write: The write above will either be completely successfully or fail completely!? Something like [KEY=1 , ColumnName=A with Value=C , ColumnName=B with Value=B] (=half of the column updates succeeded, but the other half was not yet applied/faild) can not be the RESULT of the Write in case of an error? Is this correct?

ISOLATION:

Is it really true, that even on ONE SINGLE NODE (here Node 1) writes are not isolated for someone reading the same ROW on the same Node? As desctribed above, if Client A has updated half of its columns to be changed (here ColumnName=A with Value=C ), is it really true, that another Client B connectiong the Node 1 will then indeed see the record as

Client B => Reads:  [KEY=1 , ColumnName=A with Value=C , ColumnName=B with Value=B] on Node 1

And some milliseconds later,reading again it will see ?

Client B => Reads:  [KEY=1 , ColumnName=A with Value=C , ColumnName=B with Value=D] on Node 1

.

Why are Updates not isolated on a per Node Basis?

For mee this seems to be quite easy and cheap? Why is there no in memory lock held on Node 1, that KEY=1 is currently in the process of being updated so a read can wait to finish this write? (This would be only a very small overhead, as the lock is locally held in memory on the Node1, and could be configured that the Reading client can accept the "lock" or simply read a dirty value? So it is something like a "Configurable Isolation Level"? If I need high performance I ignore locks/disable them and if I need isolation on a per node basis and accept the negative performance impact, then I waite for the in memory lock (on node 1) to be released? (Note, I am not talking about culstered/distributed-locks, but locks that guarantee on one single maching that a write is isolated on a per row-key basis!)

Or is Isolation different in regard to "changing existing columns" versus operations that "append/add columns". So that chaing columsn (as in the example above are isolated) but adding new columsn is not isolated. From my point of view, changing existing columns must be isolated/atomic.... Adding columsn is not su much required to be isolated...

The question why I am asking: If things like depicated above can happen, that reads really read partitially changed records, what usecases are then legitimate for nosql/cassandra? This means any kind of random column data can exist on a per row basis as the columsn might be in any random read/write state? I hardly know of any data and use case that is allowd to be changed "arbitrarily" on per row basis.

Thank you very much!!! jens

Upvotes: 3

Views: 2003

Answers (3)

Sid
Sid

Reputation: 440

A chat from the IRC log:

itissid: Ok so http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic says that its a special case But if we do a normal write they are isolated?

thobbs: the column is the unit of isolation nothing above that is isolated (yet)

itissid: Ok gotcha

thobbs: there's work to isolate writes to a single row

driftx: it's done for 1.1

Upvotes: 0

jbellis
jbellis

Reputation: 19377

Why is there no in memory lock held on Node 1, that KEY=1 is currently in the process of being updated so a read can wait to finish this write?

Because Cassandra heavily emphasizes denormalization for performance (distributed joins do not scale, and yes, I'm using "scale" correctly here -- distributed joins are O(N) in the number of machines in the cluster), write volume to a "materialized view" row can be VERY high. So row-level locking would introduce unacceptable contention for many real-world workloads.

Upvotes: 5

DNA
DNA

Reputation: 42597

The page you linked to says:

"As a special case, mutations against a single key are atomic but not isolated. Reads which occur during such a mutation may see part of the write before they see the whole thing.

I'm not sure of the reason for this, but I suspect that the required locking would be too coarse and would affect performance too much. Bear in mind that all updates are written first to a commit log, and then immediately to SSTables on disk in most cases (unless you set a very low consistency level), so purely memory-based locks are not necessarily helpful.

A few use cases where this does not matter:

  • Systems where data is written, perhaps added to, but not updated
  • Systems where reads are known to be separated in time from writes
  • Systems where the values of columns are not tightly coupled (and you can arrange for this to be so if any of your values can be aggregated into a single column value)
  • Systems where the data consistency isn't critical anyway, and where users will often refresh their views

Upvotes: 3

Related Questions