Praveen
Praveen

Reputation: 11

Cassandra Select query

In this table los.configuration is having a partition key consisting of 3 columns (1. groupname 2. class 3.yearofjoining). There may be 2 class BPL and APL. I want to select those two categories from database.

So I have to use the query:

SELECT * FROM CONFIGURATION WHERE GROUPNAME = 332 
AND CLASS IN ('APL','BPL') AND YEAROFJOINING IN (2014,2015);

When I am trying this query it is returning an error as

Partition KEY part class cannot be restricted by IN relation (only the last part of the partition key can).

Any idea what the problem is?

Upvotes: 0

Views: 2759

Answers (1)

Aaron
Aaron

Reputation: 57748

The main problem here, is that you are trying to use an IN relation on two parts of a composite partition key. IN can only operate on the last part of a partition or clustering key.

To get your query to work, your PRIMARY KEY will need to partition on GROUPNAME and CLASS, while clustering on YEAROFJOINING:

PRIMARY KEY ((groupname, class), yearofjoining))

When I query your table with that PRIMARY KEY, using your original query:

aploetz@cqlsh:stackoverflow> SELECT * FROM CONFIGURATION 
    WHERE GROUPNAME = 332 AND CLASS IN ('APL','BPL') 
    AND YEAROFJOINING IN (2014,2015);

 groupname | class | yearofjoining | value
-----------+-------+---------------+-------
       332 |   APL |          2014 | test1
       332 |   APL |          2015 | test3
       332 |   BPL |          2014 | test2

(3 rows)

Additional thoughts:

  1. I did not know that you could use an IN relation on both the last part of a partition and clustering key in the same query. So I learned something today!

  2. I feel compelled to warn you that the use of IN on your partition key is known not to perform well (Is the IN relation in Cassandra bad for queries?). In fact, multi-rowkey querying has been identified as a Cassandra anti-pattern.

  3. The more I think about this one, the more I think you'll get better performance if you do a slice query on YEAROFJOINING (instead of an IN). So if the years you want to search for will be in-order, say 2014 and higher, a better query would work like this:

    aploetz@cqlsh:stackoverflow> SELECT * FROM CONFIGURATION WHERE GROUPNAME = 332 AND CLASS IN ('APL','BPL') AND YEAROFJOINING >= 2014;

That will be faster because of the clustering key on YEAROFJOINING, which will leverage your on-disk sort order.

Upvotes: 2

Related Questions