Neikon
Neikon

Reputation: 169

How do I retrieve the last 10 inserts on a Cassandra table partitioned by timestamp?

I've seen a few questions of a similar problem already but none of them seem to apply to my particular case.

I have this table which contains the timestamp as primary key and a random value:

CREATE TABLE IF NOT EXISTS rand_keyspace.rands (
timestamp bigint,
randnumber float,
PRIMARY KEY (timestamp));

I would like to retrieve the last 10 inserted rows.

SELECT * FROM rand_keyspace.rands ORDER BY timestamp DESC LIMIT 10;

and I'm getting the classic error:

InvalidRequest: Error from server: code=2200 [Invalid query] message="ORDER BY is only supported when the partition key is restricted by an EQ or an IN."

Note that I have no intention of clustering by randnumber or ever querying but that column.

Upvotes: 2

Views: 393

Answers (2)

Erick Ramirez
Erick Ramirez

Reputation: 16323

To add to Aaron's [fantastic] response, sorting by partition key isn't supported because it requires a full table scan. Cassandra is designed for OLTP workloads where you need to retrieve a single partition really quickly.

You have an analytics use case so instead of plain CQL, you need to use Spark to efficiently scan over multiple partitions. This is a common access pattern for use cases which need to identify what's trending say over the last 5 minutes. Cheers!

Upvotes: 1

Aaron
Aaron

Reputation: 57748

In its present state, that table can't support that query. Without a WHERE clause, it's going to default to the order of the hashed token value of the partition keys. That will never be sequential.

Depending on the number of random numbers generated over time, I'd recommend creating a new partition key based on a time component. So something like month or day might work.

PRIMARY KEY (month, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);

Then this query would work:

SELECT * FROM rand_keyspace.rands
WHERE month='202110'
LIMIT 10;

With this PRIMARY KEY and CLUSTERING ORDER definition, you wouldn't need the ORDER BY clause, as that would happen by default (in descending order).

I wrote an article about how to solve for this a few years back: We Shall Have Order!. It might help you think through your approach.

Upvotes: 2

Related Questions