John D.
John D.

Reputation: 1701

Why doesn't Cassandra UPDATE violate the no read before writes rule

I am confused by two seemingly contradictory statements about Cassandra

  1. No reads before writes (presumably this is because writes are sequential whereas reads require scanning a primary key index)
  2. INSERT and UPDATE have identical semantics (stated in an older version of the CQL manual but presumably still considered essentially true)

Suppose I've created the following simple table:

CREATE TABLE data (
  id varchar PRIMARY KEY,
  names set<text>
);

Now I insert some values:

insert into data (id, names) values ('123', {'joe', 'john'});

Now if I do an update:

update data set names = names + {'mary'} where id = '123';

The results are as expected:

 id  | names
-----+-------------------------
 123 | {'joe', 'john', 'mary'}

Isn't this a case where a read has to occur before a write? The "cost" would seem to be the the following

  1. The cost of reading the column
  2. The cost of creating a union of the two sets (negligible here but could be noticeable with larger sets)
  3. The cost of writing the data with the key and new column data

An insert would merely be the doing just the last of these.

Upvotes: 4

Views: 375

Answers (1)

Carlo Bertuccini
Carlo Bertuccini

Reputation: 20021

There is no need for read before writing.
Internally each collection stores data using one column per entry -- When you ask for a new entry in a collection the operation is done in the single column*: if the column already exists it will be overwritten otherwise it will be created (InsertOrUpdate). This is the reason why each entry in a collection can have custom ttl and writetime.

*while with Map and Set this is transparent there is some internal trick to allow multiple columns with same name inside a List.

Upvotes: 1

Related Questions