Cassandra : Timeseries data and secondary indexes

Question

Lets say I have 100 K users spread over 10 K towns/localities. I am recv time series data for them say every 5 minutes (for each user) I dont have town as part of the key.

Is it a good practice to create secondary index on town for such case.

regards

Andy Tolbert · Accepted Answer

10,000 different keys for secondary indices is definitely not an ideal scenario as that would be a pretty high cardinality. I would recommend reading Richard Low's article on 'The sweet spot for Cassandra secondary indexing'. Read performance would probably be less than ideal as an index scan would need to happen on a replica in each partition.

In your case I would suggest denormalizing by creating a separate table called 'users_by_town' that would allow you to search for users by town.

You could always try both cases and use request tracing to understand the costs of secondary indexes in this particular scenario.

Cassandra : Timeseries data and secondary indexes

Answers (1)

Related Questions