Alpit Anand
Alpit Anand

Reputation: 1248

Cassandra asking to allow filter even after mentioning all partition key in query?

I have been trying to model a data in Cassandra, and was trying to filter the data based on date in that, as given by the answer here on SO, Here second answer is not using allow filter.

This is my current schema,

CREATE TABLE Banking.BankData(acctID TEXT, 
    email TEXT, 
    transactionDate Date , 
    transactionAmount double ,
    balance DOUBLE, 
    currentTime timestamp , 
    PRIMARY KEY((acctID, transactionDate), currentTime ) 
WITH CLUSTERING ORDER BY (currentTime DESC);

Now have inserted a data by

INSERT INTO banking.BankData(acctID, email, transactionDate, transactionAmount, balance, currentTime) values ('11', '[email protected]','2013-04-03',10010, 10010, toTimestamp(now()));

Now when I try to query, like

SELECT * FROM banking.BankData WHERE acctID = '11' AND transactionDate >  '2012-04-03';

It's saying me to allow filtering, however in the link mentioned above, it was not the case. The final requirement was to get data by year, month, week and so on, thats why had taken to partition it by day, but date range query is not working. Any suggestion in remodel or i am doing something wrong ? Thanks

Upvotes: 1

Views: 442

Answers (1)

Alex Ott
Alex Ott

Reputation: 87109

Cassandra supports only equality predicate on the partition key columns, so you can use only = operation on it.

Range predicates (>, <, >=, <=) are supported only only on the clustering columns, and it should be a last clustering column of condition.

For example, if you have following primary key: (pk, c1, c2, c3), you can have range predicate as following:

  • where pk = xxxx and c1 > yyyy
  • where pk = xxxx and c1 = yyyy and c2 > zzzz
  • where pk = xxxx and c1 = yyyy and c2 = zzzz and c3 > wwww

but you can't have:

  • where pk = xxxx and c2 > zzzz
  • where pk = xxxx and c3 > zzzz

because you need to restrict previous clustering columns before using range operation.

If you want to perform a range query on this data, you need to declare corresponding column as clustering column, like this:

PRIMARY KEY(acctID, transactionDate, currentTime ) 

in this case you can perform your query. But because you have time component, you can simply do:

PRIMARY KEY(acctID, currentTime ) 

and do the query like this:

SELECT * FROM banking.BankData WHERE acctID = '11' 
   AND currentTime >  '2012-04-03T00:00:00Z';

But you need to take 2 things into consideration:

  1. your primary should be unique - maybe you'll need to add another clustering column, like, transaction ID (for example, as uuid type) - in this case even 2 transactions happen into the same millisecond, they won't overwrite each other;
  2. if you have a lot of transactions per account, then you may need to add an another column into partition key. For example, year, or year/month, so you don't have big partitions.

P.S. In linked answer use of non-equality operation is possible because ts is clustering column.

Upvotes: 3

Related Questions