Reputation: 65
In Cassandra, I have a table with columns (a,b,c)
. I either need to query SELECT * FROM {table} WHERE a = ? and b = ?
and SELECT * FROM {table} WHERE a = ? and c = ?
.
In this case, what should I make the primary key? Could I make two tables with PRIMARY KEY(a,b)
and PRIMARY KEY(a,c)
, because Cassandra needs the entirety of the partition key and/or non-partition keys in the order they are listed? Or could I do something like PRIMARY KEY(a)
and create an INDEX
on b
and c
?
Basically, should the primary key only contain the minimum number of values required for uniqueness (and choosing an appropriate partition key from these values)? Will performance improve if I add other columns to the primary key because I need to query them?
Upvotes: 1
Views: 99
Reputation: 1181
As noted above, a well-grounded answer can only be given if you provide more information about the cardinality of the a, b and c columns. Also make sure you understand the meaning of partitioning key and clustering key - they are both part of primary key, and have a huge impact on your design.
If you have enough distinct values in column a, you can make it a partition key, and choose one from the following two approaches:
1) separate table for each query
CREATE TABLE table1_by_ab (
a int, b int, c int,
PRIMARY KEY (a, b));
CREATE TABLE table1_by_ac (
a int, b int, c int,
PRIMARY KEY (a, c));
2) one table for the more frequent query, and index for the other column:
CREATE TABLE table2 (
a int, b int, c int,
PRIMARY KEY (a, b));
CREATE INDEX ON table2 (c);
In both cases you can execute your queries on (a,b) and (a,c). Usually it is recommended to avoid secondary indexes, but in case 2) your query on (a,c) pre-selects the partition key (field a), so the secondary index can be executed on a single node, and its performance will not be bad.
If you don't have enough distinct values in column a, then you cannot make it a partitioning key, you will need to duplicate your tables, both with a compound partitioning key:
CREATE TABLE table3_by_ab (
a int, b int, c int,
PRIMARY KEY ((a, b)));
CREATE TABLE table3_by_ac (
a int, b int, c int,
PRIMARY KEY ((a, c)));
Hope this helps
Upvotes: 2