Andy Wan
Andy Wan

Reputation: 1100

How to create auto increment IDs in Cassandra

We know that it is easy to create auto increment IDs in SQL databases, is there a good solution for it in Cassandra? The IDs should be for key or column name.

Upvotes: 60

Views: 81749

Answers (8)

AlonL
AlonL

Reputation: 6720

How about the following, using Cassandra's Lightweight transactions

1 - Create IDs table:

CREATE TABLE ids (
  id_name varchar,
  next_id int,
  PRIMARY KEY (id_name)
)

2 - Insert every id you'd like to use a global sequence with

For example:

INSERT INTO ids (id_name, next_id)
VALUES ('person_id', 1)

3 - Then, when inserting to a table where you'd like to use an auto-incremented key, do the following:

3.1 - Get the next_id from the ids table:

SELECT next_id FROM ids WHERE id_name = 'person_id'

Let's say the result is next_id = 1

3.2 - Increment next_id, the following way:

UPDATE ids SET next_id = 2 WHERE id_name = 'person_id' IF next_id = 1

The result should look like this:

[{[applied]: True}]

If it was updated successfully, OR

[{[applied]: False, next_id: 2}]

If someone else has already updated it.

So, if you got True, use id '1' - it is yours. Otherwise, increment next_id (or just use the returned next_id) and repeat the process.

Upvotes: 42

Basav
Basav

Reputation: 3406

I think IMHO expecting Cassandra to provide an auto-incrementing field is WRONG

Cassandra is a elegant decentralised database, hence expecting it to provide a auto-incrmenting field is, taxing and defeats the original purpose, because this value has to be then maintained in a central place

Hence, don't make any solution which DB based to get a auto-incrementing number

Instead generate the ID in the code or service in your app, which can keep generating random unique IDs and use that to apply on your data model, this way the objective & benefit of Cassandra will not be defeated

Upvotes: 4

Luis Matta
Luis Matta

Reputation: 445

Creating a global sequential sequence of number does not really make any sense in a distributed system. Use UUIDs.

(Because you would have to make all participants agree and accept the evolution of the sequence -- under a naive implementation)

Upvotes: 42

mateenk
mateenk

Reputation: 61

there is a counter datatype which can be used. Consider the below example.

CREATE KEYSPACE counterks WITH REPLICATION =
{ 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 };

Create a table for the counter column.

CREATE TABLE counterks.page_view_counts
(counter_value counter,
url_name varchar,
page_name varchar,
PRIMARY KEY (url_name, page_name)
);

Load data into the counter column.

UPDATE counterks.page_view_counts
SET counter_value = counter_value + 1
WHERE url_name='www.datastax.com' AND page_name='home';

Take a look at the counter value.

SELECT * FROM counterks.page_view_counts;

Output is:

 url_name         | page_name | counter_value
------------------+-----------+---------------
 www.datastax.com |      home |             1

Increase the value of the counter.

 UPDATE counterks.page_view_counts
 SET counter_value = counter_value + 2
 WHERE url_name='www.datastax.com' AND page_name='home';

Take a look at the counter value.

 url_name         | page_name | counter_value
------------------+-----------+---------------
www.datastax.com |      home |             3  

Refer this for more details: http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_counter_t.html

Upvotes: 6

AlonL
AlonL

Reputation: 6720

Edit: This solution is not proper. See the first comment.

My solution:

1 - Create IDs table:

CREATE TABLE ids (
  id_name varchar,
  next_id counter,
  PRIMARY KEY (id_name)
)

2 - When inserting to a table where you'd like to use an auto-incremented key, do the following:

2.1 - Increment counter (it will be created if not exists), using the highest consistency level

UPDATE ids
  SET next_id = next_id + 1
  WHERE id_name = $AUTO_INCREMENTED_ID
  USING CONSISTENCY ALL

2.2 - Get the new id value:

SELECT next_id
  FROM ids
  WHERE id_name = $AUTO_INCREMENTED_ID

2.3 - Insert the value with the auto-incremented id

INSERT INTO some_table ($AUTO_INCREMENTED_ID, ...)
  VALUES ($RESULT_FROM_PREVIOUS_QUERY, ...)

Words starting with '$' in my answer are self-explanatory (I hope) placeholders...

Of course this is not a recommended method. Use it only if you have to.

Upvotes: 2

Tracker1
Tracker1

Reputation: 19344

Do they really need to be sequential, or do you just need counting numbers that are much smaller than a UUID that are easily entered by a person?

If you really need sequential numbers, then you will need to do one of the following.

  • Have a table in cassandra where the key/id is a field for the generator, and the value is a number... do conditional updates in a loop until you successfully increment the count. (bad idea)

  • Have a generator service that will give you the next number. This can only run on a single system and be a single point of failure, but depending on your needs this might be best.

Alternatively... Similar to the first, but get batches of 100 or so numbers at a time, and deal those out inside your process/thread... This will have less contention, but no guarantee of sequential order, only uniqueness... If you only want shorter numbers that are unique for display, this may be your best bet.

Upvotes: 1

Florent
Florent

Reputation: 1399

This question is pretty old but I'd like to complete it with an other solution.

Any solution that relies on nodes synchronization is unreasonable. It's pretty sure to break either by blocking IDs generation or by creating duplicate IDs.

MySQL way

You can reproduce the way it's done with the mysql master-master replication with the auto_increment_increment and auto_increment_offset parameters.

To reproduce it, you need to know the number of nodes or the max number of expected nodes and you need to create a (non-cassandra) counter (a file per example) on each node.

Each time you want to generate a new number, you find the current value, add the increment and save it. If it doesn't exist yet, it's the offset.

So for 10 nodes, you would have an increment of 10 and an offset of 1 for the first node, 2 for the second node, etc. Node 1 would create the IDs 1, 11, 21. Node 2 would create the IDs 2, 21, 22.

If you want your IDs to be (approximatively) ordered between nodes, you need to maintain a shared counter and make sure each generated ID is higher than the shared counter. That way, unless your nodes/datacenters are out of sync for a long time, you shouldn't notice much difference.

Prefixing

You can do basically the same thing by prefixing the ID (if it's an acceptable solution) with the node number (or name). And you don't have to known the number of nodes. Node 1 would create 1_1, 1_2, 1_3. Node 2 would create 2_1, 2_2, 2_3.

Upvotes: 3

Sharpling
Sharpling

Reputation: 2579

There is no good solution.

  1. Create a column with a number, increase the number and save it to all replicas together with a temporary id, read all replicas and check if the temporary id is "yours", if not do it again.. not a great solution and will not scale.

or

  1. Build your own id service where you fetch your next id. This service will only be run in a single instance and will be a non scaling scary factor.

As soon as anything goes beyond a single instance the sequencing of id's gets complicated, at least if you want it to scale. That includes relational databases.

Upvotes: 16

Related Questions