Reputation: 39

How to choose proper tables structure in cassandra?

Suppose I have table with the following structure create table tasks ( user_id uuid, name text, task_id uuid, description text, primary key ((user_id), name, task_id) );

It allows me to get all tasks for user sorted by name ascending. Also I added task_id to primary key to avoid upserts. The following query holds select * from tasks where user_id = ? as well as select * from tasks where user_id = ? and name > ?

However, I cannot get task with specific task_id. For example, following query crashes select * from tasks where user_id = ? and task_id = ? with this error PRIMARY KEY column "task_id" cannot be restricted as preceding column "name" is not restricted It requires name column to be specified, but at the moment I have only task_id (from url, for example) and user_id (from session).

How should I create this table to perform both queries? Or I need create separate table for second case? What is the common pattern in this situation?

Upvotes: 1

Answers (3)

Dan

Reputation: 326

If you have the extra disk space, the best method would be to replicate the data in a second table. You should avoid using secondary indexes in production. Your application would, of course, need to write to both these tables. But Cassandra is darn good at making that efficient.

create table tasks_by_name (
   user_id uuid,
   name text,
   task_id uuid,
   description text,
   primary key ((user_id), name, task_id)
);

create table tasks_by_id (
   user_id uuid,
   name text,
   task_id uuid,
   description text,
   primary key ((user_id), task_id)
);

Upvotes: 0

Aaron

Reputation: 57748

PRIMARY KEY column "task_id" cannot be restricted as preceding 
  column "name" is not restricted

You are seeing this error because CQL does not permit queries to skip primary key components.

How should I create this table to perform both queries? Or I need create separate table for second case? What is the common pattern in this situation?

As you suspect, the typical way that problems like this are solved with Cassandra is that an additional table is created for each query. In this case, recreating the table with a PRIMARY KEY designed to match your additional query pattern would simply look like this:

create table tasks_by_user_and_task (
   user_id uuid,
   name text,
   task_id uuid,
   description text,
   primary key ((user_id), task_id)
);

You could simply add one more redundant column taskId with same value as task_id and create a secondary index on taskId.

While I am usually not a fan of using secondary indexes, in this case it may perform ok. Reason being, is that you would still be restricting your query by partition key, which would eliminate the need to examine additional nodes. The drawback (as Undefined_variable pointed out) is that you cannot create a secondary index on a primary key component, so you would need to duplicate that column (and apply the index to the non-primary key column) to get that solution to work.

It might be a good idea to model and test both solutions for performance.

Upvotes: 0

undefined_variable

Reputation: 6218

You could simply add one more redundant column taskId with same value as task_id and create a secondary index on taskId. Then you can query user_id=? and tsakId=?

Upvotes: 1

How to choose proper tables structure in cassandra?

Answers (3)

Related Questions