gstackoverflow
gstackoverflow

Reputation: 37034

How do I filter by a CQL column not in the PRIMARY KEY?

I have following table in Cassandra:

create table ibc.my_table
(
    my_key_1 uuid,
    my_key_2 uuid,
    my_key_3 uuid,
    end_timestamp timestamp,
    locked boolean,
    maximum_duration_minutes int,
    maximum_participant_count int,
    owner_id uuid,
    start_timestamp timestamp,
    status_code text,
    primary key ((my_key_1, my_key_2), my_key_3)
)

This query works:

select * from my_table
where my_key_1=3fa85f64-5717-4562-b3fc-2c963f66afa6
    AND my_key_2=3fa85f64-5717-4562-b3fc-2c963f66afa6

but this one - not:

select * from my_table
where my_key_1=3fa85f64-5717-4562-b3fc-2c963f66afa6
    AND my_key_2=3fa85f64-5717-4562-b3fc-2c963f66afa6
    AND owner_id=3fa85f64-5717-4562-b3fc-2c963f66afa6

Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING

I don't want to use ALLOW FILTERING because of performance. I also can't make owner_id as a part of primary key.

Do I have another option in Cassandra?

Upvotes: 2

Views: 2058

Answers (1)

Aaron
Aaron

Reputation: 57748

In this case, I would use a Secondary Index. Typically, you would have to be careful over performance with Secondary Indexes, just as you would with using ALLOW FILTERING.

But in this particular case, it's ok. Your query's WHERE clause is filtering on both of your partition keys (my_key_1 and my_key_2...terrible names, BTW). By specifying both partition keys, you're ensuring that the query can be served by a single node, and the index will simply help Cassandra find the matching entries inside the partition.

To add the index:

CREATE INDEX ON ibc.my_table(owner_id);

Then re-run the query:

SELECT * FROM ibc.my_table
WHERE my_key_1=3fa85f64-5717-4562-b3fc-2c963f66afa6
  AND my_key_2=3fa85f64-5717-4562-b3fc-2c963f66afa6
  AND owner_id=3fa85f64-5717-4562-b3fc-2c963f66afa6 ;

 my_key_1                             | my_key_2                             | my_key_3                             | end_timestamp | locked | maximum_duration_minutes | maximum_participant_count | owner_id                             | start_timestamp                 | status_code
--------------------------------------+--------------------------------------+--------------------------------------+---------------+--------+--------------------------+---------------------------+--------------------------------------+---------------------------------+-------------
 3fa85f64-5717-4562-b3fc-2c963f66afa6 | 3fa85f64-5717-4562-b3fc-2c963f66afa6 | aff2206e-fc3f-4ab4-9918-38da59c1f31d |          null |  False |                     null |                      null | 3fa85f64-5717-4562-b3fc-2c963f66afa6 | 2022-03-11 00:19:49.699000+0000 |           5

(1 rows)

This should perform well, as long as the index is always used with the complete partition key.

Upvotes: 2

Related Questions