Reputation: 4613
I could not figure out how to know oldest and newest row of very huge table if I have log_date(format: 2015-02-20 00:00:00+0000) column in the table.
I have tried the following variations:
select account_id, log_date FROM my.table where log_date<'2013-03-20 00:00' limit 1 allow filtering;
ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
select account_id, log_date from my.table order by log_date desc limit 1;
InvalidRequest: code=2200 [Invalid query] message="ORDER BY is only supported when the partition key is restricted by an EQ or an IN."
How can I get oldest and newest row of a huge table?
Upvotes: 2
Views: 1789
Reputation: 1136
If the primary key of your column family is (account_id), then order by log_date to get latest account will not work at all.
In CQL, you could only order by clustering keys under specified partition key. So if your primary key is (account_id, logdate), meaning account_id is the partition key and log_date is the clustering key. You could only do order by when specifying account_id in your criterias:
select * from riskless.account_data_pg where account_id = 1 order by log_date;
In short, if you want your column family to store time series data, you need to carefully design your primary key, the design of primary key determines how & what you could order by.
Another option is to use Solr or Elasticsearch together with cassandra, indexing rows in column family, with which you could search more freely with Solr or Elasticsearch search API.
Upvotes: 1