Loki
Loki

Reputation: 6261

Cassandra best practice for querying data only by date range?

I'm planning storing log records in Cassandra, and primarily need to be able query them by date range. My primary key is a time based UUID. I've seen lots of examples that allow filtering by date range in addition to some key, but is there any way to efficiently query just by a date range, without such a key, and without using an Ordered Partitioner?

Upvotes: 2

Views: 481

Answers (1)

G Quintana
G Quintana

Reputation: 4667

No, the partition key (first element of the primary key) allows queries to be routed to the appropriate node and not scan the whole cluster. Yet if the partition is still the same then data won't be distributed over the cluster and a few nodes will get the workload. You could create a table like:

create table log (
   log_type text,
   day text, -- In format YYYY-MM-DD for instance
   id timeuuid,
   message text,
   primary key ((log_type, day), id)
)

Then from your date range, you can determine the day values and the possible partition keys. Add a condition on timeuiid to finish:

select * from log where log_type='xxx' and day='2014-02-19' and dateOf(id)>? and dateOf(id)<?
select * from log where log_type='xxx' and day='2014-02-20' and dateOf(id)>? and dateOf(id)<?
select * from log where log_type='xxx' and day='2014-02-21' and dateOf(id)>? and dateOf(id)<?

Another option could be to use the ALLOW FILTERING clause, but this will do a full cluster scan. So it's a good idea only if you know that at least 90% of partition keys will contain interesting data.

select * from log where dateOf(id)>? and dateOf(id)<? allow filtering

Upvotes: 2

Related Questions