Cassandra multiple column sorting design

Question

Assuming an object as following:

performance {
    userid,
    date,
    score1,
    score2,
    score3,
    ...
}

I want to be able to query performances sorted on either of the score fields. Should I create different tables for each score field as compound key or is there a better way to do this?

I hesitate to duplicate performance objects for each score field, as there may be many score fields.

Rock · Accepted Answer

In the performace table, you could consider to create secondary indexes on score1, score2 and score3. However, Cassandra's built-in secondary indexes are best on a column family having many rows that contain the indexed value. So, you have to balance your use case, avoid to use secondary indexes to query a huge volume of records for a small number of results. In the purpose of sorting, this is not a good solution.

In best practice, using roughly one table per query pattern is recommended. Data duplication is fine. It also helps to identify the most frequent query patterns and isolate the less frequent. Some queries might be executed only a few thousand times, while others a billion times. Also consider which queries are sensitive to latency and which are not. It is recommended to think and balance two high-level rules while modelling: spreading data evenly around cluster and minimising the number of partitions read.

Please see also: Basic Rules of Cassandra Data Modeling

Cassandra multiple column sorting design

Answers (2)

Related Questions