Vishal
Vishal

Reputation: 20617

Cassandra query by primary key and a column throws error

Schema

CREATE TABLE books (
   isbn text PRIMARY KEY,
   author text
);

insert into books (isbn, author) values ('111', 'Sally');
insert into books (isbn, author) values ('112', 'Fred');
insert into books (isbn, author) values ('113', 'Joe');

With the above data, I am able to query via primary key 111

select * from books where isbn = '111';

However, when I put author in the where condition it throws error

select * from books where isbn = '111' and author = 'Fred';

Query 1 ERROR: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING

I am not able to understand that if the data is already filtered by the primary key (which is only one record) why does it throw an error ?

Second, if I use allow filtering is there any performance impact?

Edit: https://dzone.com/articles/apache-cassandra-and-allow-filtering has given me some clue.

Upvotes: 1

Views: 1671

Answers (2)

chas spenlau
chas spenlau

Reputation: 375

In Cassandra you cannot query on non-partition or clustering key columns.

If you want your exact above use case to work, author has to be included in your primary key declaration.

CREATE TABLE books ( isbn text, author text, PRIMARY KEY(isbn, author) )

NOTE: With this data model you MUST ALWAYS query with isbn but author is optional.

You should not use allow filtering. There will be a performance impact.

How you declare your partition and clustering keys in cassandra is very important and should be used on a query by query basis. If you have a new use case for querying the same data but in a different where clause, make a new table! :D

Upvotes: 2

Steven Hollingsworth
Steven Hollingsworth

Reputation: 31

This link might help you. From what I understand there is a performance hit with cassandra if you allow filtering (guessing it's similar to SQL RBAR, which is slow, and why it throws and error about filtering)

I'm still relatively new to cassandra, but from what I've read you need to run the select query to include ALL columns defined in your primary key, IF you haven't defined a secondary index. But there are limitations with secondary indexes.

HTH Good luck.

Upvotes: 3

Related Questions