Multiple Secondary index in cassandra performance in same partition

Question

I have table like this.

CREATE TABLE posts (
topic text
country text,
bookmarked text,
id uuid,
PRIMARY KEY (topic,id)
);

After that I have created secondary index on country and bookmarked as below.

CREATE INDEX posts_country ON posts (country);
CREATE INDEX posts_bookmarked ON posts (bookmarked);

Now I am querying on single partition with secondary index as below.

select * from posts where topic='cassandra' and country='india' and bookmarked='true' allow filtering;
select * from posts where topic='sql' and country='us' and bookmarked='true' allow filtering;

My question is if all the query is going to same partition(topic = cassandra or topic=sql) then allow filtering will query all the row or on particular partition? And how performance will be impacted?

Any suggestion on how I can handle such scenario if it is impacting performance.

Thanks.

Chaity · Accepted Answer

As partition key is mentioned, data with be searched from one partition. It would definitely be efficient than without mentioning partition key and only queried by secondary index column (as many nodes have to be queried) But performance impaction depends on your data set.

ALLOW FILTERING involves data filtering and thus may have unpredictable performance.

Filtering in data (specially large dataset) is likely to be very inefficient thus not preferable. But it depends how inefficient it would be.

If your partition is too big (too many rows in a single partition) and if you filter on a column that has the most unique value and filter data for getting a small data set is not efficient cause Cassandra loads large data and filters them out.

select * from posts where topic='cassandra' and country='india';

This query is efficient though as you have mentioned the partition key.

select * from posts where topic='cassandra' and country='india' and bookmarked='true' allow filtering;

Adding an index on bookmarked might improve the query performance.

Cassandra will then use the index with the highest selectivity to find the rows that need to be loaded. It will however not change anything regarding the need for ALLOW FILTERING, as it will still have to filter the loaded rows using the remaining predicate.

Please read the below article. I think it has the answers you require I guess :) https://www.datastax.com/dev/blog/allow-filtering-explained-2

Additionally, secondary index on very high or low cardinality is not efficient. As you have secondary index on bookmarked (data type is text) but if value is only 'true' or 'false' it will not be efficient. https://docs.datastax.com/en/cql/3.3/cql/cql_using/useWhenIndex.html

Multiple Secondary index in cassandra performance in same partition

Answers (1)

Related Questions