Knows Not Much
Knows Not Much

Reputation: 31546

Query min partition key based on date range (clustering key)

I have a table Foo in cassandra with 4 columns foo_id bigint, date datetime, ref_id bigint, type int

here the partitioning key is foo_id. the clustering keys are date desc, ref_id and type

I want to write a CSQL query which is the equivalent of the SQL below

select min(foo_id) from foo where date >= '2016-04-01 00:00:00+0000'

I wrote the following CSQL

select foo_id from foo where 
foo_id IN (-9223372036854775808, 9223372036854775807) 
and date >= '2016-04-01 00:00:00+0000';

but this returns empty results.

Then I tried

select foo_id from foo where 
  token(foo_id) > -9223372036854775808 
  and token(foo_id) < 9223372036854775807 
  and date >= '2016-04-01 00:00:00+0000';

but this results in error

Unable to execute CSQL Script on 'Cassandra'. Cannot execute this query
as it might involve data filtering and thus may have unpredictable 
performance. If you want to execute this query despite performance   
unpredictability, use ALLOW FILTERING.

I don't want to use ALLOW FILTERING. but I want the minimum of foo_id at the start of the specified date.

Upvotes: 0

Views: 1016

Answers (1)

xmas79
xmas79

Reputation: 5180

You should probably denormalize your data and create a new table for the purpose. I propose something like:

CREATE TABLE foo_reverse (
    year int,
    month int,
    day int,

    foo_id bigint, 
    date datetime,
    ref_id bigint, 
    type int,
    PRIMARY KEY ((year, month, day), foo_id)
)

To get the minimum foo_id you would query that table by something like:

SELECT * FROM foo_reverse WHERE year = 2016 AND month = 4 AND day = 1 LIMIT 1;

That table would allow you to query on a "per day" basis. You can change the partition key to better reflect your needs. Beware of the potential hot spots you (and I) could create by selecting an appropriate time range.

Upvotes: 1

Related Questions