Cassandra : How to select data updated in last 30 days

Question

We have a requirement to load last 30 days updated data from the table.

One of the potential solution below does not allow to do so.

select * from XYZ_TABLE where WRITETIME(lastupdated_timestamp) > (TOUNIXTIMESTAMP(now())-42,300,000);

select * from XYZ_TABLE where lastupdated_timestamp > (TOUNIXTIMESTAMP(now())-42,300,000);

The table has columns as

lastupdated_timestamp (with an index on this field)
lastupdated_userid (with an index on this field)

Any pointers ...

Samantha Blowers · Accepted Answer

Unless your table was built with this query in mind, your query will search every partition of the database, which will become very costly once your dataset has become large and will probably result in a timeout.

To efficiently complete this query, the XYZ_TABLE should have a primary key something like so:

PRIMARY KEY ((update_month, update_day), lastupdated_timestamp)

This is so Cassandra knows right where to go find the data. It has month and day buckets it can quickly find, then you can run queries like this to find updates on a certain day.

SELECT * FROM XYZ_TABLE WHERE update_month = 07-18 and update_day = 06

Cassandra : How to select data updated in last 30 days

Answers (1)

Related Questions