somenickname
somenickname

Reputation: 619

Secondary index on for low cardinality clustering column

Using Cassandra as db:

Say we have this schema primary_key((id1),id2,type) with index on type, because we want to query by id1 and id2.

Does query like SELECT * FROM my_table WHERE id1=xxx AND type='some type' going to perform well?

I wonder if we have to create and manage another table for this situation?

Upvotes: 2

Views: 151

Answers (1)

Christophe Schmitz
Christophe Schmitz

Reputation: 2996

The way you are planning to use secondary index is ideal (which is rare). Here is why:

  • you specify the partition key (id1) in your query. This ensures that only the relevant partition (node) will be queried, instead of hitting all the nodes in the cluster (which is not scalable)
  • You are (presumably) indexing an attribute of low cardinality (I can imagine you have maybe a few hundred types?), which is the sweet spot when using secondary indexes.

Overall, your data model should perform well and scale. Yet, if you look for optimal performances, I would suggest you use an additional table ((id1), type, id2).

Finale note: if you have a limited number of type, you might consider using solely ((id1), type, id2) as a single table. When querying by id1-id2, just issue a few parallel queries against the possible value of type.

The final decision needs to take into account your target latency, the disk usage (duplicating table with a different primary key is sometimes too expensive), and the frequency of each of your queries.

Upvotes: 1

Related Questions