ChannaB
ChannaB

Reputation: 439

Select Cassandra row key

What criteria should be considered when selecting a rowid for a column family in cassandra? I want to migrate a relational database which does not contain any primary key. In that case what should be the best rowid selection?

Upvotes: 1

Views: 1195

Answers (3)

Gordon Seidoh Worley
Gordon Seidoh Worley

Reputation: 8068

Your partition key(s) should be how you want to store the data and how you will always look it up. You can only retrieve data by partition key, so it's important to choose something that you will naturally look up (this is why sometimes data is denormalized in Cassandra by storing it in multiple tables that mimic materialized views).

The clustering column key(s), if any, are mostly useful if you sometimes want to retrieve all the data in a partition and sometimes only want some of it. This is great for things like timeseries data because you can cluster the data on a timeuuid, store it sorted, and then do efficient range queries over the data.

Upvotes: 0

Desert Ice
Desert Ice

Reputation: 4600

There are many things to consider when consider the primary key of the cassandra system

  1. Understand the difference between primary and partition key

    CREATE TABLE users ( user_name varchar PRIMARY KEY, password varchar, );

In the above case primary and partition keys are the same.

CREATE TABLE users (
  user_name varchar,
  user_email varchar,
  password varchar,
  PRIMARY KEY (user_name, user_email)
);

Here Primary key is the user_name and user_email together, where as user_name is the partition keys.

CREATE TABLE users (
  user_name varchar,
  user_email varchar,
  password varchar,
  PRIMARY KEY ((user_name, user_email))
);

Here the primary key and partition keys are both equal to user_name,user_email

  1. Carefully define your partition key. Partition keys are used for lookups by cassandra, so you must define your partition key by looking at your select queries.

Cassandra organizes data where partition keys are used for lookups, using the previous example

For the first case:

user_name ---> email:password email:data_of_birth 

ABC --> [email protected]:abc123 [email protected]:22/02/1950 [email protected]:def123...

In the second case:

user_name,email ---> password data_of_birth ABC,[email protected] --> abc123 22/02/1950
  1. Making partition key more complex containing many data will make sure that you have many rows instead of a single row with many columns. It might be beneficial to balance the number of rows you might induce vs the number of columns each row might have. Having incredible large of small rows might not be too beneficial for reads

  2. Partition keys indicate how data is distributed across nodes, so consider whether you have hotspots and decide whether you want to break it further.

Case 1: All users named ABC will be in a single node

Case 2: Users named ABC might or might not be in the single node, depending on the key that is generated along with their email.

Upvotes: 0

Stefan Podkowinski
Stefan Podkowinski

Reputation: 5249

Use natural keys that can be derived from the dataset if possible (e.g. phone_number for phone book, user_name for user table). If thats not possible, use a UUID.

Upvotes: 1

Related Questions