soumyadeep sarkar
soumyadeep sarkar

Reputation: 572

How Cassandra manage insertion, update and Deletion of column and Column data. internally

Actually I am getting confused with some concepts regarding cassandra.

  1. what do we Actually mean by updating Cassandra row? is it mean adding more column or updates in the value of the column. or it is both.?
  2. When we are adding more column to a row. is the previous row in the sstable got invalidate and new row entry is inserted in the SSTABLE with the newly added rows.?
  3. Since SSTable is immutable so each new update in Column data OR addition of Column OR Deletion of Column data will result in invalidating the previous row and inserting a new Row with all the previous column+new Column?

Please Help..

Upvotes: 2

Views: 2047

Answers (2)

Lyuben Todorov
Lyuben Todorov

Reputation: 14153

What do we Actually mean by updating Cassandra row? is it mean adding more column or updates in the value of the column. or it is both.?

In cassandra, updating a row and inserting a row are the same operation, bot lead to adding data to a memtable (in-memory sstable) which is latter flushed to disk and becomes an sstable (also a log line is written to the commit log if persistent writes are enabled). If you insert a column (btw in cassandra terms, a column is the same as a cell, and a row is known as a partition, you might find this useful if you do any further reading) which already exists, e.g:

INSERT INTO db.tbl (id, value) VALUES ('text_id1', 'some text as a value');
INSERT INTO db.tbl (id, value) VALUES ('text_id1', 'some text as a value');

You'll end up with 1 partition, since the first one is overwritten by the second insert. This means that inserting partitions with duplicate keys leads to the previous one being overwritten (and the overwrite is based on the timestamp at the time of insert, last write wins).

When we are adding more column(cell) to a row(partition). is the previous row in the sstable got invalidate and new row entry is inserted in the SSTABLE with the newly added rows.?

For cql, the previous columns will just contain a null value. No invalidation will happen, you can alter schemas as you please. If you delete a column, its' data will be removed during the next compaction with the aim of reclaiming back disk space.

Since SSTable is immutable so each new update in Column data OR addition of Column OR Deletion of Column data will result in invalidating the previous row and inserting a new Row with all the previous column+new Column?

Kind of, sstables are merged into larger sstables when necessary, how this is done depends on the compaction strategy that is being used. There are two flavours, size-tiered and levelled compaction. Covering how they work is a whole separate question that has been answered by people who are smarter than me so have a read here.

Upvotes: 2

Adam Holmberg
Adam Holmberg

Reputation: 7365

Updating is covered here: http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_write_update_c.html

As you note, SSTables are immutable, so you're probably wondering what happens when a later write supercedes data already in an SSTable. The storage engine reads from all tables that might have data for a requested row (as determined by bloom filters for each table). Understanding the read path might clarify this for you: http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_about_reads_c.html Specifically: http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_about_read_path_c.html

Upvotes: 0

Related Questions