Reputation: 5808
We have a requirement to load last 30 days updated data from the table.
One of the potential solution below does not allow to do so.
select * from XYZ_TABLE where WRITETIME(lastupdated_timestamp) > (TOUNIXTIMESTAMP(now())-42,300,000);
select * from XYZ_TABLE where lastupdated_timestamp > (TOUNIXTIMESTAMP(now())-42,300,000);
The table has columns as
lastupdated_timestamp (with an index on this field)
lastupdated_userid (with an index on this field)
Any pointers ...
Upvotes: 4
Views: 1095
Reputation: 251
Unless your table was built with this query in mind, your query will search every partition of the database, which will become very costly once your dataset has become large and will probably result in a timeout.
To efficiently complete this query, the XYZ_TABLE should have a primary key something like so:
PRIMARY KEY ((update_month, update_day), lastupdated_timestamp)
This is so Cassandra knows right where to go find the data. It has month and day buckets it can quickly find, then you can run queries like this to find updates on a certain day.
SELECT * FROM XYZ_TABLE WHERE update_month = 07-18 and update_day = 06
Upvotes: 7