Avoiding filtering with a compound partition key in Cassandra

Question

I am fairly new to Cassandra and currently have to following table in Cassandra:

CREATE TABLE time_data (
id int,
secondary_id int,
timestamp timestamp,
value bigint,
PRIMARY KEY ((id, secondary_id), timestamp)
);

The compound partition key (with secondary_id) is necessary in order to not violate max partition sizes.

The issue I am running in to is that I would like to complete the query SELECT * FROM time_data WHERE id = ?. Because the table has a compound partition key, this query requires filtering. I realize this is a querying a lot of data and partitions, but it is necessary for the application. For reference, id has relatively low cardinality and secondary_id has high cardinality.

What is the best way around this? Should I simply allow filtering on the query? Or is it better to create a secondary index like CREATE INDEX id_idx ON time_data (id)?

Mandraenke · Accepted Answer

You will need to specify full partition key on queries (ALLOW FILTERING will impact performance badly in most cases).

One way to go could be if you know all secondary_id (you could add a table to track them in necessary) and do the job in your application and query all (id, secondary_id) pairs and process them afterwards. This has the disadvantage of beeing more complex but the advantage that it can be done with async queries and in parallel so many nodes in your cluster participate in processing your task.

See also https://www.datastax.com/dev/blog/java-driver-async-queries

Avoiding filtering with a compound partition key in Cassandra

Answers (1)

Related Questions