Reputation: 39
Suppose I have table with the following structure
create table tasks (
user_id uuid,
name text,
task_id uuid,
description text,
primary key ((user_id), name, task_id)
);
It allows me to get all tasks for user sorted by name
ascending. Also I added task_id
to primary key to avoid upserts. The following query holds
select * from tasks where user_id = ?
as well as
select * from tasks where user_id = ? and name > ?
However, I cannot get task with specific task_id
. For example, following query crashes
select * from tasks where user_id = ? and task_id = ?
with this error
PRIMARY KEY column "task_id" cannot be restricted as preceding column "name" is not restricted
It requires name
column to be specified, but at the moment I have only task_id
(from url, for example) and user_id
(from session).
How should I create this table to perform both queries? Or I need create separate table for second case? What is the common pattern in this situation?
Upvotes: 1
Views: 155
Reputation: 326
If you have the extra disk space, the best method would be to replicate the data in a second table. You should avoid using secondary indexes in production. Your application would, of course, need to write to both these tables. But Cassandra is darn good at making that efficient.
create table tasks_by_name (
user_id uuid,
name text,
task_id uuid,
description text,
primary key ((user_id), name, task_id)
);
create table tasks_by_id (
user_id uuid,
name text,
task_id uuid,
description text,
primary key ((user_id), task_id)
);
Upvotes: 0
Reputation: 57748
PRIMARY KEY column "task_id" cannot be restricted as preceding
column "name" is not restricted
You are seeing this error because CQL does not permit queries to skip primary key components.
How should I create this table to perform both queries? Or I need create separate table for second case? What is the common pattern in this situation?
As you suspect, the typical way that problems like this are solved with Cassandra is that an additional table is created for each query. In this case, recreating the table with a PRIMARY KEY designed to match your additional query pattern would simply look like this:
create table tasks_by_user_and_task (
user_id uuid,
name text,
task_id uuid,
description text,
primary key ((user_id), task_id)
);
You could simply add one more redundant column taskId with same value as task_id and create a secondary index on taskId.
While I am usually not a fan of using secondary indexes, in this case it may perform ok. Reason being, is that you would still be restricting your query by partition key, which would eliminate the need to examine additional nodes. The drawback (as Undefined_variable pointed out) is that you cannot create a secondary index on a primary key component, so you would need to duplicate that column (and apply the index to the non-primary key column) to get that solution to work.
It might be a good idea to model and test both solutions for performance.
Upvotes: 0
Reputation: 6218
You could simply add one more redundant column taskId with same value as task_id and create a secondary index on taskId.
Then you can query user_id=? and tsakId=?
Upvotes: 1