Cassandra - Impact of a Materialized View on table delete optimisation

Question

I know that there is an increase of 10% when using a materialized view but I would like to know (and haven't find any clue about it yet) if there is a repercussion on the table delete optimization when doing a big delete based on the primary key.

Here is a case example:

TABLE a_simple_table (
    year int,
    fulldate date,
    ref1 text,
    ref2 text,
    data blob,
    PRIMARY KEY ((year), fulldate, ref1, ref2)
);

MATERIALIZED VIEW demo.a_simple_table_view 
AS SELECT year, fulldate, ref1, ref2, data
FROM demo.a_simple_table
WHERE ref1 IS NOT NULL AND year IS NOT NULL AND fulldate IS NOT NULL AND ref2 IS NOT NULL
PRIMARY KEY ((ref1), year, fulldate, ref2)
WITH CLUSTERING ORDER BY (year DESC, fulldate DESC, ref2 ASC);

For what I understand and what I have been told, when we do the following:

DELETE from a_simple_table WHERE year = 2017;

Cassandra mark only one Tombstone and we therefor don't do 100 delete if there is a 100 rows in the table under the primary key value 2017.

But, since the materialized view has to find each rows to delete into his own table, what does the delete cost becomes?

Marko Švaljek · Accepted Answer

Delete operation is no different than the insert: http://www.doanduyhai.com/blog/?p=1930

from https://www.datastax.com/dev/blog/new-in-cassandra-3-0-materialized-views

When a deletion occurs, the materialized view will query all of the deleted values in the base table and generate tombstones for each of the materialized view rows, because the values that need to be tombstoned in the view are not included in the base table’s tombstone...

Basically the "hit" would be as if you try to insert all the values in the base table row. And the reading will take the hit because of increased number of tombstones in materialized view.

Cassandra - Impact of a Materialized View on table delete optimisation

Answers (1)

Related Questions