Reputation: 1100
We know that it is easy to create auto increment IDs in SQL databases, is there a good solution for it in Cassandra? The IDs should be for key or column name.
Upvotes: 60
Views: 81749
Reputation: 6720
How about the following, using Cassandra's Lightweight transactions
CREATE TABLE ids (
id_name varchar,
next_id int,
PRIMARY KEY (id_name)
)
For example:
INSERT INTO ids (id_name, next_id)
VALUES ('person_id', 1)
SELECT next_id FROM ids WHERE id_name = 'person_id'
Let's say the result is next_id = 1
UPDATE ids SET next_id = 2 WHERE id_name = 'person_id' IF next_id = 1
The result should look like this:
[{[applied]: True}]
If it was updated successfully, OR
[{[applied]: False, next_id: 2}]
If someone else has already updated it.
So, if you got True, use id '1' - it is yours. Otherwise, increment next_id (or just use the returned next_id) and repeat the process.
Upvotes: 42
Reputation: 3406
I think IMHO expecting Cassandra to provide an auto-incrementing field is WRONG
Cassandra is a elegant decentralised database, hence expecting it to provide a auto-incrmenting field is, taxing and defeats the original purpose, because this value has to be then maintained in a central place
Hence, don't make any solution which DB based to get a auto-incrementing number
Instead generate the ID in the code or service in your app, which can keep generating random unique IDs and use that to apply on your data model, this way the objective & benefit of Cassandra will not be defeated
Upvotes: 4
Reputation: 445
Creating a global sequential sequence of number does not really make any sense in a distributed system. Use UUIDs.
(Because you would have to make all participants agree and accept the evolution of the sequence -- under a naive implementation)
Upvotes: 42
Reputation: 61
there is a counter datatype which can be used. Consider the below example.
CREATE KEYSPACE counterks WITH REPLICATION =
{ 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 };
Create a table for the counter column.
CREATE TABLE counterks.page_view_counts
(counter_value counter,
url_name varchar,
page_name varchar,
PRIMARY KEY (url_name, page_name)
);
Load data into the counter column.
UPDATE counterks.page_view_counts
SET counter_value = counter_value + 1
WHERE url_name='www.datastax.com' AND page_name='home';
Take a look at the counter value.
SELECT * FROM counterks.page_view_counts;
Output is:
url_name | page_name | counter_value
------------------+-----------+---------------
www.datastax.com | home | 1
Increase the value of the counter.
UPDATE counterks.page_view_counts
SET counter_value = counter_value + 2
WHERE url_name='www.datastax.com' AND page_name='home';
Take a look at the counter value.
url_name | page_name | counter_value
------------------+-----------+---------------
www.datastax.com | home | 3
Refer this for more details: http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_counter_t.html
Upvotes: 6
Reputation: 6720
Edit: This solution is not proper. See the first comment.
My solution:
1 - Create IDs table:
CREATE TABLE ids (
id_name varchar,
next_id counter,
PRIMARY KEY (id_name)
)
2 - When inserting to a table where you'd like to use an auto-incremented key, do the following:
2.1 - Increment counter (it will be created if not exists), using the highest consistency level
UPDATE ids
SET next_id = next_id + 1
WHERE id_name = $AUTO_INCREMENTED_ID
USING CONSISTENCY ALL
2.2 - Get the new id value:
SELECT next_id
FROM ids
WHERE id_name = $AUTO_INCREMENTED_ID
2.3 - Insert the value with the auto-incremented id
INSERT INTO some_table ($AUTO_INCREMENTED_ID, ...)
VALUES ($RESULT_FROM_PREVIOUS_QUERY, ...)
Words starting with '$' in my answer are self-explanatory (I hope) placeholders...
Of course this is not a recommended method. Use it only if you have to.
Upvotes: 2
Reputation: 19344
Do they really need to be sequential, or do you just need counting numbers that are much smaller than a UUID that are easily entered by a person?
If you really need sequential numbers, then you will need to do one of the following.
Have a table in cassandra where the key/id is a field for the generator, and the value is a number... do conditional updates in a loop until you successfully increment the count. (bad idea)
Have a generator service that will give you the next number. This can only run on a single system and be a single point of failure, but depending on your needs this might be best.
Alternatively... Similar to the first, but get batches of 100 or so numbers at a time, and deal those out inside your process/thread... This will have less contention, but no guarantee of sequential order, only uniqueness... If you only want shorter numbers that are unique for display, this may be your best bet.
Upvotes: 1
Reputation: 1399
This question is pretty old but I'd like to complete it with an other solution.
Any solution that relies on nodes synchronization is unreasonable. It's pretty sure to break either by blocking IDs generation or by creating duplicate IDs.
You can reproduce the way it's done with the mysql master-master replication with the auto_increment_increment
and auto_increment_offset
parameters.
To reproduce it, you need to know the number of nodes or the max number of expected nodes and you need to create a (non-cassandra) counter (a file per example) on each node.
Each time you want to generate a new number, you find the current value, add the increment and save it. If it doesn't exist yet, it's the offset.
So for 10 nodes, you would have an increment of 10 and an offset of 1 for the first node, 2 for the second node, etc. Node 1 would create the IDs 1, 11, 21. Node 2 would create the IDs 2, 21, 22.
If you want your IDs to be (approximatively) ordered between nodes, you need to maintain a shared counter and make sure each generated ID is higher than the shared counter. That way, unless your nodes/datacenters are out of sync for a long time, you shouldn't notice much difference.
You can do basically the same thing by prefixing the ID (if it's an acceptable solution) with the node number (or name). And you don't have to known the number of nodes. Node 1 would create 1_1, 1_2, 1_3. Node 2 would create 2_1, 2_2, 2_3.
Upvotes: 3
Reputation: 2579
There is no good solution.
or
As soon as anything goes beyond a single instance the sequencing of id's gets complicated, at least if you want it to scale. That includes relational databases.
Upvotes: 16