Reputation: 11247
I am building my own sharding solution. Each ID is constructed as follows:
ex. 00001000010000000015
I use virtual shards, so I can point all the shards to a single server and when I need more capacity, I just add another server and point some of the virtual shards to that server, so next time the data will be written onto the new server and not the first one, although read will go to the two servers, at least until I move the data and change it in the config file.
My problem is with the incremental number. I want those to be unique. Using the build int incremental id in mysql for a specific table is not good, because I might move the data to another server, and there I might have another data that use the increment number of its table, so I might get duplicate IDs.
So my question is how can I generate a unique ID in a scalable fashion, without using an external table that will generate that ID, because it won't scale. I can point you to how Pinterest has solved this here which might help. I thought about giving each table a incremental value that skips at different values for every shard, so they incremental values on all shards are never the same.
The idea is to build my own sharding solution with Amazon RDS, so replicas are already there, balancing is easy because I can take advantage of the promote slave to master and just delete data on both servers and change my sharding config file. I think it's possible to build a solution that can easily scale using Amazon RDS and it will be MUCH cheaper than what other companies offer today (already did my homework on that).
I don't want to use GUID in my URLs. I don't mind using long numbers. Pinterest, Tumblr, Facebook and many others don't use Guid, so I know that their is a solution out there, and just want to know which solution do you think works best, taking into consideration that I want to use numerical IDs.
I'm developing my application in ASP.NET C#
Upvotes: 3
Views: 2701
Reputation: 6382
I use the shard id which is Int16 to set AUTO_INCREMENT value to each table during table creation by left-shifting shard id by (64 - 16) and adding 1.
The next example is using ServiceStack.ORMLite library, however it could be done in pure SQL as well. When I need to create a new table, I iterate over each existing shard, open DB connection to that shard (connection details are stored in a lookup table) and call this method:
private static void CreateTableInShard<T>(IDbConnection db, bool overwrite = false, short shardId = 0)
where T : IDataObject, new()
{
using (var trans = db.BeginTransaction())
{
db.CreateTable<T>(overwrite);
if (overwrite)
{
var tableName = typeof(T).Name;
var ai = ((long)shardId << 48) + 1;
var sql = @"ALTER TABLE " + tableName + @"
AUTO_INCREMENT " + ai + @" ;";
db.ExecuteSql(sql);
}
trans.Commit();
}
}
public interface IDataObject
{
long Id { get; set; }
}
I have many logical shards that are initially on one machine. When I will have to scale, I will have to move an entire shard to another machine and to change a connection string for a shard ID in my lookup table. But ID generation after that move won't be affected.
To query a sharded table by id, I could get shard id as (id >> 48)
and then query specific shard.
Upvotes: 0
Reputation: 328760
In a nutshell, your approach to generate IDs is doomed: If you want to generate unique IDs, you need a central service which will eventually become a bottleneck.
Also including the shard ID smells bad; what happens when you move data to a new shard? Will you have to update all the IDs?
If you want a scalable solution, you will have to look at UUIDs or similar approaches.
Alternatively, you could use a central service and allocate a block of IDs (say 10'000) per call. That way, you wouldn't need to hammer the central service that often but if the service fails, the whole system will be dead.
Upvotes: 4