Neer1009
Neer1009

Reputation: 314

Does add new value/update existing value in map in cassandra create tombstones?

I was following this page of datastax :- https://docs.datastax.com/en/cql-oss/3.3/cql/cql_using/useInsertMap.html to find how to update the map in cassandra. But I am suspicious if this does not create unwanted tombstones in following scenarios :-

  1. UPDATE cycling.cyclist_teams SET teams = teams + {2009 : 'DSB Bank - Nederland bloeit'} WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e

Will adding new value to map (if 2009 was not existed in map) create any tombstone ?

  1. UPDATE cycling.cyclist_teams SET teams = teams + {2009 : 'DSB Bank - Nederland bloeit'} WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2

Will updating old value to map (if 2009 key was existed before in map) create tombstone for old value or any other kind of tombstone?

Upvotes: 1

Views: 569

Answers (2)

fg78nc
fg78nc

Reputation: 5232

It won't create a tombstone, because you are updating collection with + . Tombstone would be created if you would create a new collection instead, (map in this instance) like this:

UPDATE cycling.cyclist_teams SET teams = {2009 : 'DSB Bank - Nederland bloeit'} WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2

Cassandra always writes data in append only mode, with the only difference that for commit log it is appended ti the end of the log, and for the memtable it written in the order of the partition key and clustering column(s). Memtables's data is periodically flushed into the SSTable. Your conflicting data may end up duplicated (with the conflicting values) in SSTable. In fact all inserts are upserts, unless you add conditions with lightweight transactions.

Both values will be written and retrieved from a)row cache(RAM), b) memtable(RAM), or c)SSTable(HDD/SSD) upon read and then on conflict the data with the latest timestamp will be returned back to the driver. Depending on your read consistency level - always for ANY and depending on read_repair_chance for other consistency levels - old values in replicas memtables(RAM) will be updated. The old (outdated) values will be eventually removed upon SSTable(HDD/SSD) compaction process.

You can experiment and then retrieve table statistics to see if there are any tombstones by executing:

$CASSANDRA_HOME/bin/nodetool cfstats keyspace.table

Upvotes: 1

Aaron
Aaron

Reputation: 57808

It won't create a tombstone (no delete or deliberate write of null), but it will "obsolete" the previous value.

This means that both the old and new values for 2009 will be retrieved at read-time, and Cassandra will filter-out all but the most recent. Also, depending on how much time has elapsed since the first write to teams, it entirely possible that the old and new values could have been written to separate SSTable files, meaning that the read/reconciliation process will take longer.

So while this won't create a tombstone, it'll have a similar effect in that a large amount of obsoleted data (from in-place writes/updates) to the same value will cause performance to slow over time.

Upvotes: 3

Related Questions