Liron Harel
Liron Harel

Reputation: 11247

Sharding and ID generation as instagram

My question is regarding ID generation for sharded environment. I am following the same steps as instagram did for generating unique ids. I have a few question on the implementation of this id generation in MySQL.

This is how the ID is being generated (This is a PGQL stored procedure.)

CREATE OR REPLACE FUNCTION insta5.next_id(OUT result bigint) AS $$
DECLARE
    our_epoch bigint := 1314220021721;
    seq_id bigint;
    now_millis bigint;
    shard_id int := 5;
BEGIN
    SELECT nextval('insta5.table_id_seq') %% 1024 INTO seq_id;

    SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis;
    result := (now_millis - our_epoch) << 23;
    result := result | (shard_id << 10);
    result := result | (seq_id);
END;
$$ LANGUAGE PLPGSQL;



CREATE TABLE insta5.our_table (
    "id" bigint NOT NULL DEFAULT insta5.next_id(),
    ...rest of table schema...
)

MY question is how can I do that in MySQL (equivalent code). This stored procedure should be called from a query to get the next id.

My other question is regarding querying the shards. From what I understand they use logical shards that are maps to actual servers. If they decide to map a logical shard to a new server, this means that they have to query two servers at the same time to aggregate the results, because before they assigned a new server for the same logical shard, some data was added to the older server. I just wanted to know if there's a better way to use logical shards and having the option to query a single server where the data resides instead of querying all the servers that are and were belong to that logical shard?

Thanks.

Upvotes: 1

Views: 4549

Answers (6)

Yugandhar Chaudhari
Yugandhar Chaudhari

Reputation: 3964

This is postgres equivalent

Specify TABLE_SCHEMA and MASKTABLE below

I am creating a MASKTABLE with a dummy first entry

DELIMITER $$
CREATE OR REPLACE FUNCTION generate_next_id() RETURNS bigint NOT DETERMINISTIC
MAIN: BEGIN
DECLARE our_epoch bigint;
DECLARE seq_id bigint;
DECLARE now_millis bigint;
DECLARE shard_id int;
DECLARE param bigint ;
SET @our_epoch = 1568873367231;
SET @shard_id = 1;
SELECT AUTO_INCREMENT FROM information_schema.TABLES WHERE TABLE_SCHEMA = "SCHEMANAME" AND TABLE_NAME = "MASKTABLE" into @seq_id;
SELECT FLOOR(UNIX_TIMESTAMP()) * 1000 into @now_millis;
SELECT (@now_millis - @our_epoch) << 23 into @param;
SELECT @param | (@shard_id <<10) into @param; 
select @param | (@seq_id) into @param; 
RETURN @param;
END MAIN;$$ 
DELIMITER ;

Usage

select generate_next_id()

You can use it in trigger like

CREATE TRIGGER trigger_name
BEFORE INSERT ON TableName 
FOR EACH ROW
SET new.id = generate_next_id();

Upvotes: 1

Nate
Nate

Reputation: 1

Implementing the exact same id generation logic with a MySQL stored procedure is not possible. However, this can be done using a MySQL UDF.

Here is a UDF that recreates instagrams id generator with a few changes. Modifying it to work exactly like the instagram id generator would involve updating the MAX_* constants and the bit shifting logic in next_shard_id and shard_id_to_ms.

Upvotes: 0

jsidlosky
jsidlosky

Reputation: 417

I have a question out for the conversion routine I wrote, but it does appear to work!

Can AUTO_INCREMENT be safely used in a BEFORE TRIGGER in MySQL

Once I get the question answered, I'll update this answer as well.

MySQL Version:

CREATE TRIGGER shard_insert BEFORE INSERT ON tablename
FOR EACH ROW BEGIN

DECLARE seq_id BIGINT;
DECLARE now_millis BIGINT;
DECLARE our_epoch BIGINT DEFAULT 1314220021721;
DECLARE shard_id INT DEFAULT 1;

SET now_millis = (SELECT UNIX_TIMESTAMP());
SET seq_id = (SELECT AUTO_INCREMENT FROM information_schema.TABLES WHERE TABLE_SCHEMA = "dbname" AND TABLE_NAME =     "tablename");
SET NEW.id = (SELECT ((now_millis - our_epoch) << 23) | (shard_id << 10) | (SELECT MOD(seq_id, 1024)));
END

Upvotes: 0

Petr Bela
Petr Bela

Reputation: 8741

Regarding the other question, all data for one logical shard only live on one server in production at a time (without taking replication into account). When they're running out of capacity of their existing servers, they duplicate data in each server to a follower, and when the new server is ready (at this point both the original server and the new server contain all data for a specific logical shard), they start reading all data for half of the shards from the new server, and the other half stays in the original server. See e.g. http://www.craigkerstiens.com/2012/11/30/sharding-your-database/ for illustration.

Upvotes: 0

Doron Levari
Doron Levari

Reputation: 614

At ScaleBase we found a nice way to provide support for autoincrements, through the variable: SET @@auto_increment_increment=4;.

Just make sure each shard has a starting offset and you're free to go. It's straight, simple, compatible with existing MySQL and stuff.

Upvotes: 0

Farfarak
Farfarak

Reputation: 1527

By the look of the code it looks like you just need to replicate sequence, you can do it by creating table in MySQL with AUTO_INCREMENT and use it for generating identity numbers.

Upvotes: 1

Related Questions