Incremental IDs across shards

Question

I am building my own sharding solution. Each ID is constructed as follows:

shard id (small int)
table type id (small int)
incremental number (big int)

ex. 00001000010000000015

I use virtual shards, so I can point all the shards to a single server and when I need more capacity, I just add another server and point some of the virtual shards to that server, so next time the data will be written onto the new server and not the first one, although read will go to the two servers, at least until I move the data and change it in the config file.

My problem is with the incremental number. I want those to be unique. Using the build int incremental id in mysql for a specific table is not good, because I might move the data to another server, and there I might have another data that use the increment number of its table, so I might get duplicate IDs.

So my question is how can I generate a unique ID in a scalable fashion, without using an external table that will generate that ID, because it won't scale. I can point you to how Pinterest has solved this here which might help. I thought about giving each table a incremental value that skips at different values for every shard, so they incremental values on all shards are never the same.

The idea is to build my own sharding solution with Amazon RDS, so replicas are already there, balancing is easy because I can take advantage of the promote slave to master and just delete data on both servers and change my sharding config file. I think it's possible to build a solution that can easily scale using Amazon RDS and it will be MUCH cheaper than what other companies offer today (already did my homework on that).

I don't want to use GUID in my URLs. I don't mind using long numbers. Pinterest, Tumblr, Facebook and many others don't use Guid, so I know that their is a solution out there, and just want to know which solution do you think works best, taking into consideration that I want to use numerical IDs.

I'm developing my application in ASP.NET C#

Aaron Digulla · Accepted Answer

In a nutshell, your approach to generate IDs is doomed: If you want to generate unique IDs, you need a central service which will eventually become a bottleneck.

Also including the shard ID smells bad; what happens when you move data to a new shard? Will you have to update all the IDs?

If you want a scalable solution, you will have to look at UUIDs or similar approaches.

Alternatively, you could use a central service and allocate a block of IDs (say 10'000) per call. That way, you wouldn't need to hammer the central service that often but if the service fails, the whole system will be dead.

Incremental IDs across shards

Answers (2)

Related Questions