Kenneth Woods
Kenneth Woods

Reputation: 65

Create an index, or add to the primary key and create a new table?

In Cassandra, I have a table with columns (a,b,c). I either need to query SELECT * FROM {table} WHERE a = ? and b = ? and SELECT * FROM {table} WHERE a = ? and c = ?.

In this case, what should I make the primary key? Could I make two tables with PRIMARY KEY(a,b) and PRIMARY KEY(a,c), because Cassandra needs the entirety of the partition key and/or non-partition keys in the order they are listed? Or could I do something like PRIMARY KEY(a) and create an INDEX on b and c?

Basically, should the primary key only contain the minimum number of values required for uniqueness (and choosing an appropriate partition key from these values)? Will performance improve if I add other columns to the primary key because I need to query them?

Upvotes: 1

Views: 99

Answers (1)

medvekoma
medvekoma

Reputation: 1181

As noted above, a well-grounded answer can only be given if you provide more information about the cardinality of the a, b and c columns. Also make sure you understand the meaning of partitioning key and clustering key - they are both part of primary key, and have a huge impact on your design.

If you have enough distinct values in column a, you can make it a partition key, and choose one from the following two approaches:

1) separate table for each query

CREATE TABLE table1_by_ab (
  a int, b int, c int, 
  PRIMARY KEY (a, b));

CREATE TABLE table1_by_ac (
  a int, b int, c int, 
  PRIMARY KEY (a, c));

2) one table for the more frequent query, and index for the other column:

CREATE TABLE table2 (
  a int, b int, c int, 
  PRIMARY KEY (a, b));

CREATE INDEX ON table2 (c);

In both cases you can execute your queries on (a,b) and (a,c). Usually it is recommended to avoid secondary indexes, but in case 2) your query on (a,c) pre-selects the partition key (field a), so the secondary index can be executed on a single node, and its performance will not be bad.

If you don't have enough distinct values in column a, then you cannot make it a partitioning key, you will need to duplicate your tables, both with a compound partitioning key:

CREATE TABLE table3_by_ab (
  a int, b int, c int, 
  PRIMARY KEY ((a, b)));

CREATE TABLE table3_by_ac (
  a int, b int, c int, 
  PRIMARY KEY ((a, c)));

Hope this helps

Upvotes: 2

Related Questions