Reputation: 6083
I was reading Instagram's sharding technique and in the slides on page 136 it had the following code (I'm assuming Python??) for getting the shard_id from the generated id and decided to see if I could get the shard_id in Postgres, however I'm unable to. It may be because I'm not as familiar with bitwise operations or another slight difference in Python operators to Postgres operators.
# Python code:
# pulling shard ID from ID:
shard_id = id ^ ((id >> 23) << 23)
timestamp = EPOCH + id >> 23
Questions:
^
operator that is affecting the code?WITH var AS (
SELECT 1314220021721::bigint AS epoch
, 1403496968580::bigint AS ms
, (31341 % 2000)::bigint AS shard_id -- equals 1341
, (5000 % 1024)::bigint AS seq_id
), bit AS (
SELECT *
, ((ms) - epoch) << (64-41) AS ms_bit
, shard_id << (64-41-13) AS shard_bit
FROM var
), val AS (
SELECT *
, (ms_bit | shard_bit | seq_id) AS id
FROM bit
)
SELECT *
, ms_bit::bit(64) AS ms_64
, shard_bit::bit(64) AS shard_64
, seq_id::bit(64) AS seq_64
, id::bit(64) AS id_64
-- "shard_id_conv" should equal "shard_id" (**and does not**, instead it's 1374088)
-- note: '^' is changed to '#'
-- shard_id_conv = 1374088
, id # ((id >> 23) << 23) AS shard_id_conv
-- "ms_conv" should equal "ms" (and does)
, epoch + (id >> 23) AS ms_conv
-- "shard_seq" equals "shard_id_conv" (and does, but isn't the actual shard_id)
-- shard_seq = 1374088
, (shard_bit | seq_id) AS shard_seq
FROM val;
/* -- 64 BIT
0000101001100100101010010000110011010101100000000000000000000000 -- ms_bit
0000000000000000000000000000000000000000000101001111010000000000 -- shard_bit
0000000000000000000000000000000000000000000000000000001110001000 -- seq_bit
0000101001100100101010010000110011010101100101001111011110001000 -- id_bit
*/
Upvotes: 3
Views: 877
Reputation: 6083
I think Instagram messed up the formula in the slides as we need to remove the 10 seq_id bits.
Please Note: #
is the xor
operator in Postgres. Instagram used ^
as the xor operator in their formula.
Incorrect:
id # ((id >> 23) << 23) AS shard_id
Correct:
(id # ((id >> 23) << 23)) >> 10 AS shard_id
The >> 10
removes the seq_id bits by doing a right bit shift.
If there is a better way to remove the 10 seq_id bits that performs better in Postgres, please answer.
Upvotes: 2