Reputation: 1248
I have been trying to model a data in Cassandra, and was trying to filter the data based on date in that, as given by the answer here on SO, Here second answer is not using allow filter.
This is my current schema,
CREATE TABLE Banking.BankData(acctID TEXT,
email TEXT,
transactionDate Date ,
transactionAmount double ,
balance DOUBLE,
currentTime timestamp ,
PRIMARY KEY((acctID, transactionDate), currentTime )
WITH CLUSTERING ORDER BY (currentTime DESC);
Now have inserted a data by
INSERT INTO banking.BankData(acctID, email, transactionDate, transactionAmount, balance, currentTime) values ('11', '[email protected]','2013-04-03',10010, 10010, toTimestamp(now()));
Now when I try to query, like
SELECT * FROM banking.BankData WHERE acctID = '11' AND transactionDate > '2012-04-03';
It's saying me to allow filtering, however in the link mentioned above, it was not the case. The final requirement was to get data by year, month, week and so on, thats why had taken to partition it by day, but date range query is not working. Any suggestion in remodel or i am doing something wrong ? Thanks
Upvotes: 1
Views: 442
Reputation: 87109
Cassandra supports only equality predicate on the partition key columns, so you can use only =
operation on it.
Range predicates (>
, <
, >=
, <=
) are supported only only on the clustering columns, and it should be a last clustering column of condition.
For example, if you have following primary key: (pk, c1, c2, c3)
, you can have range predicate as following:
where pk = xxxx and c1 > yyyy
where pk = xxxx and c1 = yyyy and c2 > zzzz
where pk = xxxx and c1 = yyyy and c2 = zzzz and c3 > wwww
but you can't have:
where pk = xxxx and c2 > zzzz
where pk = xxxx and c3 > zzzz
because you need to restrict previous clustering columns before using range operation.
If you want to perform a range query on this data, you need to declare corresponding column as clustering column, like this:
PRIMARY KEY(acctID, transactionDate, currentTime )
in this case you can perform your query. But because you have time component, you can simply do:
PRIMARY KEY(acctID, currentTime )
and do the query like this:
SELECT * FROM banking.BankData WHERE acctID = '11'
AND currentTime > '2012-04-03T00:00:00Z';
But you need to take 2 things into consideration:
uuid
type) - in this case even 2 transactions happen into the same millisecond, they won't overwrite each other;P.S. In linked answer use of non-equality operation is possible because ts
is clustering column.
Upvotes: 3