Cassandra - how to do group-by and limit query?

Question

The table data looks like this. Table has clustering order desc on timestamp, and primary key is (name, timestamp):

name - address - timestamp
John - J_Addr 1 - Jan 01, 2017
John - J_Addr 2 - Feb 05, 2017
Mark - M_Addr 1 - Jan 01, 2017
Mark - M_Addr 2 - Mar 05, 2017

Is there a way to get the latest address for each name? In above case, the expected result would be:

name - address - timestamp
John - J_Addr 2 - Feb 05, 2017
Mark - M_Addr 2 - Mar 05, 2017

Ashraful Islam · Accepted Answer

If you are using cassandra version >= 3.6 then you can use PER PARTITION LIMIT

Example :

SELECT * FROM table_name PER PARTITION LIMIT 1;

Else If you are inserting every value of timestamp from the current time then you can just create another table like below :

CREATE TABLE user_address (
    name text PRIMARY KEY,
    address text
);

Whenever you insert into base table also insert into the maintained table. you could use batch if you want to maintain atomicity between these table.

So every time you insert address for a user, address will be upsert. So you will get latest address

Else you have to scan all the row and group by limit from the client side

Cassandra - how to do group-by and limit query?

Answers (1)

Related Questions