cassandra indexing multiple columns

Question

Datastax documentation talks about creating more than one secondary index here. But when I have to query in a where clause using both indexes the documentation suggests using allow filtering. 1) I am worried about using allow filtering on production and 2) If I am to use allow filtering, does that not defeat the whole purpose of those indexes in a scenario where I always have to use both those indexes together.

A possible solution seems to be custom indexes on both columns, but the apache documentation here is a bit vague and also does not speak about performance of these.

So what is the suggested approach when I need to query with multiple secondary indexes? Any opinions on solving this will be helpful.

EDIT1: A view of my cassandra table is available on this link represented as a Java Class. I have to query using where col1='val1' and col2='val2' and col3='val3'

EDIT2: I did think about creating a new column with data of col1,2,3 something like newcol='val1val2val3' so I can create a single secondary index on newcol and do away with this conundrum, but it seems to be a bit of a hack rather than strategic. Any comments on this plan will be appreciated. PS: This newcol will have a medium cardinality .

EDIT3: I did find good info on secondary indexes and allow filtering here which does seem to help

Chris Lohfink · Accepted Answer

1) You should be. I highly recommend avoiding secondary indexes and ALLOW FILTERING consider them advanced features for corner cases.

2) It can be more efficient with index, but still horrible, and also horrible in more new ways. There are only very few scenarios where secondary indexes are acceptable. There are very few scenarios where ALLOW FILTERING is acceptable. You are looking at an overlap of the two.

Maybe take a step back. You're building pojos to represent objects and trying to map that into Cassandra. The approach you should make when data modeling with Cassandra is to think of the queries you are going to make and design tables to match that - not the data. It is normal to end up with multiple tables that you update (disk space and writes are cheap) on changes so that your reads can hit 1 partition efficiently and get everything you need in one hit. Denormalize the data, Cassandra is not relational and 3rd normal form is generally a bad thing here.

cassandra indexing multiple columns

Answers (2)

Related Questions