Reputation: 1
I want to be able to use IN
in any column in any order in Cassandra
So I have the next table:
CREATE TABLE test (a TEXT, b TEXT, c TEXT, PRIMARY KEY (a, b, c));
and this data:
INSERT INTO test (a, b, c) VALUES ('a1', 'b1', 'c1');
INSERT INTO test (a, b, c) VALUES ('a2', 'b2', 'c2');
This query works:
SELECT * FROM test WHERE c IN ('c1', 'c2') AND b IN ('b1') ALLOW FILTERING;
But if you remove the b IN
it gives this error:
SELECT * FROM test WHERE c IN ('c1', 'c2') ALLOW FILTERING;
InvalidRequest: Error from server: code=2200 [Invalid query] message="IN
restrictions are not supported on indexed columns"
Seems like if I want to use IN
in a column I should have used IN
in some previous columns?
Is there a way to avoid this?
Modifying the Schema is valid
but I need to use Cassandra and allow filtering through any columns (if there's no need to filter thought a columns then there would be no IN
clause for that column).
Thanks for reading.
P.S: I know you are not supposed to use ALLOW FILTERING
please assume there's no other way.
Edit: Seems like they may have fixed this?: https://issues.apache.org/jira/browse/CASSANDRA-14344
Upvotes: 0
Views: 724
Reputation: 644
There is a lot of confusion cassandra's primary keys. In order to respond to your question, i think you need to understand how cassandra primary keys are working internally.
When you are creating a Primary key with multiple fields like in your case:
CREATE TABLE test (a TEXT, b TEXT, c TEXT, PRIMARY KEY (a, b, c));
The cassandra documentation states that you can only use In clause on last column of the partition key and the last of the clustering key, but attention you'll have to provide all the other clustering keys.
So basically there is no way to do that in one table.
You should think of a tradeOff of your query flexibility vs data duplication. One solution will be to denormalize your data in 2 tables like this:
CREATE TABLE test1 (a TEXT, b TEXT, c TEXT, PRIMARY KEY (a, b));
CREATE TABLE test2 (a TEXT, b TEXT, c TEXT, PRIMARY KEY (c, a, b));
By doing so, you will be able to query each table depending on your use case. The following queries will work:
SELECT * FROM test2 WHERE c IN ('c1', 'c2');
SELECT * FROM test1 WHERE a IN ('a1', 'a2');
SELECT * FROM test1 WHERE b IN ('b1', 'b2') ALLOW FILTERING;
And so on, i think you got the point. But really try to do the best tradeoff, in order to minimize the allow filtering usage. and remember that the queries on partition keys directly will be the fastest.
Upvotes: 1