Reputation: 439
What criteria should be considered when selecting a rowid for a column family in cassandra? I want to migrate a relational database which does not contain any primary key. In that case what should be the best rowid selection?
Upvotes: 1
Views: 1195
Reputation: 8068
Your partition key(s) should be how you want to store the data and how you will always look it up. You can only retrieve data by partition key, so it's important to choose something that you will naturally look up (this is why sometimes data is denormalized in Cassandra by storing it in multiple tables that mimic materialized views).
The clustering column key(s), if any, are mostly useful if you sometimes want to retrieve all the data in a partition and sometimes only want some of it. This is great for things like timeseries data because you can cluster the data on a timeuuid, store it sorted, and then do efficient range queries over the data.
Upvotes: 0
Reputation: 4600
There are many things to consider when consider the primary key of the cassandra system
Understand the difference between primary and partition key
CREATE TABLE users ( user_name varchar PRIMARY KEY, password varchar, );
In the above case primary and partition keys are the same.
CREATE TABLE users (
user_name varchar,
user_email varchar,
password varchar,
PRIMARY KEY (user_name, user_email)
);
Here Primary key is the user_name and user_email together, where as user_name is the partition keys.
CREATE TABLE users (
user_name varchar,
user_email varchar,
password varchar,
PRIMARY KEY ((user_name, user_email))
);
Here the primary key and partition keys are both equal to user_name,user_email
Cassandra organizes data where partition keys are used for lookups, using the previous example
For the first case:
user_name ---> email:password email:data_of_birth
ABC --> [email protected]:abc123 [email protected]:22/02/1950 [email protected]:def123...
In the second case:
user_name,email ---> password data_of_birth ABC,[email protected] --> abc123 22/02/1950
Making partition key more complex containing many data will make sure that you have many rows instead of a single row with many columns. It might be beneficial to balance the number of rows you might induce vs the number of columns each row might have. Having incredible large of small rows might not be too beneficial for reads
Partition keys indicate how data is distributed across nodes, so consider whether you have hotspots and decide whether you want to break it further.
Case 1: All users named ABC will be in a single node
Case 2: Users named ABC might or might not be in the single node, depending on the key that is generated along with their email.
Upvotes: 0
Reputation: 5249
Use natural keys that can be derived from the dataset if possible (e.g. phone_number for phone book, user_name for user table). If thats not possible, use a UUID.
Upvotes: 1