Why Cassandra doesn't have secondary index?

Question

Cassandra is positioned as scalable and fast database. Why , I mean from technical details, above goals cannot be accomplished with secondary indexes?

Aaron · Accepted Answer

Cassandra does indeed have secondary indexes. But secondary index usage doesn't work well with distributed databases, and it's because each node only holds a subset of the overall dataset.

I previously wrote an answer which discussed the underlying details of secondary index queries:

How do secondary indexes work in Cassandra?

While it should help give you some understanding of what's going on, that answer is written from the context of first querying by a partition key. This is an important distinction, as secondary index usage within a partition should perform well.

The problem is when querying only by a secondary index, that Cassandra cannot guarantee all of your data will be able to be served by a single node. When this happens, Cassandra designates a node as a coordinator, which in turn queries all other nodes for the specified indexed values.

Essentially, instead of performing sequential reads from a single node, secondary index usage forces Cassandra to perform random reads from all nodes. Now you don't have just disk seek time, but also network time complicating things.

The recommendation for Cassandra modeling, is to duplicate your data into new tables to support the desired query. This adds in some other complications with keeping data in-sync. But (when done correctly) it ensures that your queries can indeed be served by a single node. That's a tradeoff you need to make when building your model. You can have convenience or performance, but not both.

Why Cassandra doesn't have secondary index?

Answers (2)

Related Questions

Why Cassandra doesn&#39;t have secondary index?

Answers (2)

Related Questions

Why Cassandra doesn't have secondary index?