wonder
wonder

Reputation: 903

Cassandra order by and filter through secondary indexes

My cassandra table structure is as follows:

CREATE TABLE sujata
          ... (ID int, roll_number int, age int, PRIMARY KEY (ID,roll_number));

I had inserted some of the records where ID acts as a partition for a number of rows. I am performing the following query:

SELECT count(*) FROM sujata WHERE ID=1 ORDER BY roll_number ASC and age=24 Allow Filtering;

I am getting the following error:

missing EOF at 'and' (...1 ORDER BY roll_number ASC [and] age...)">

I don't know if it possible to filter out the results after performing order by. Please suggest where am i missing. Thank you.

Upvotes: 1

Views: 2776

Answers (1)

Aaron
Aaron

Reputation: 57758

  1. Don't use ALLOW FILTERING. It doesn't perform or scale, because it allows you to query Cassandra in ways that it is not designed to support.

  2. CQL is interpreting your statement as ORDER BY roll_number ASC and age=24 as you trying to ORDER BY two things. The AND belongs in your WHERE clause, and needs to be specified before ORDER BY.

  3. Cassandra uses your clustering keys to write your data's on-disk sort order. ORDER BY only allows you to flip the sort direction (ascending vs. descending) of your clustering order. So if you have already specified the correct sort order in your table definition, then you don't need to specify ORDER BY.

  4. To query your table by both ID and age, you need to design your PRIMARY KEY with those two columns as the first two. You can only query on columns defined in your PRIMARY KEY (secondary indexes not withstanding), and then only in the same order (you can't skip keys). To accomplish this, I created a query table (sujataByIDAndAge) that looks like this:

.

CREATE TABLE sujataByIDAndAge (
    ID int, 
    roll_number int, 
    age int, 
PRIMARY KEY (ID,age,roll_number));

Now after inserting a few rows:

aploetz@cqlsh:stackoverflow> INSERT INTO sujatabyidandage  (id, roll_number, age) 
                             VALUES (2, 20, 26);
aploetz@cqlsh:stackoverflow> INSERT INTO sujatabyidandage  (id, roll_number, age) 
                             VALUES (1, 100, 24);
aploetz@cqlsh:stackoverflow> INSERT INTO sujatabyidandage  (id, roll_number, age) 
                             VALUES (1, 110, 24);
aploetz@cqlsh:stackoverflow> INSERT INTO sujatabyidandage  (id, roll_number, age) 
                             VALUES (1, 190, 24);

Now I can run your query, and I don't need to specify ORDER BY or ALLOW FILTERING:

aploetz@cqlsh:stackoverflow> SELECT COUNT(*) FROM sujatabyidandage WHERE ID=1 AND age=24;

 count
-------
     3

(1 rows)

Also important to mention, is that if result set order is important to you then you must query by the primary keys. Cassandra only enforces order of results within a partition key (ID in your case).

Upvotes: 1

Related Questions