Amit Kaneria
Amit Kaneria

Reputation: 5808

Cassandra : How to select data updated in last 30 days

We have a requirement to load last 30 days updated data from the table.

One of the potential solution below does not allow to do so.

select * from XYZ_TABLE where WRITETIME(lastupdated_timestamp) > (TOUNIXTIMESTAMP(now())-42,300,000);

select * from XYZ_TABLE where lastupdated_timestamp > (TOUNIXTIMESTAMP(now())-42,300,000);

The table has columns as

lastupdated_timestamp (with an index on this field)
lastupdated_userid (with an index on this field)

Any pointers ...

Upvotes: 4

Views: 1095

Answers (1)

Samantha Blowers
Samantha Blowers

Reputation: 251

Unless your table was built with this query in mind, your query will search every partition of the database, which will become very costly once your dataset has become large and will probably result in a timeout.

To efficiently complete this query, the XYZ_TABLE should have a primary key something like so:

PRIMARY KEY ((update_month, update_day), lastupdated_timestamp)

This is so Cassandra knows right where to go find the data. It has month and day buckets it can quickly find, then you can run queries like this to find updates on a certain day.

SELECT * FROM XYZ_TABLE WHERE update_month = 07-18 and update_day = 06

Upvotes: 7

Related Questions