thames
thames

Reputation: 6083

Get Shard ID using bitwise operators in Postgres

I was reading Instagram's sharding technique and in the slides on page 136 it had the following code (I'm assuming Python??) for getting the shard_id from the generated id and decided to see if I could get the shard_id in Postgres, however I'm unable to. It may be because I'm not as familiar with bitwise operations or another slight difference in Python operators to Postgres operators.

# Python code:
# pulling shard ID from ID:
shard_id = id ^ ((id >> 23) << 23)
timestamp = EPOCH + id >> 23

Questions:

  1. Is there more operator differences between python and Postgres besides the ^ operator that is affecting the code?
  2. Is there more to getting the shard_id than what is in the Instagram code snip? Which I'm thinking is the issue as the seq_id would need to be removed as well? UPDATE: It looks like this is the case as (shard_bit | seq_id) = shard_id_conv.
  3. What is the correct way to get the shard_id in Postgres?
WITH var AS (
SELECT 1314220021721::bigint AS epoch
    , 1403496968580::bigint AS ms
    , (31341 % 2000)::bigint AS shard_id -- equals 1341
    , (5000 % 1024)::bigint AS seq_id
), bit AS (
SELECT *
    , ((ms) - epoch) << (64-41) AS ms_bit
    , shard_id << (64-41-13) AS shard_bit
FROM var
), val AS (
SELECT *
    , (ms_bit | shard_bit | seq_id) AS id
FROM bit
)
SELECT *
    , ms_bit::bit(64) AS ms_64
    , shard_bit::bit(64) AS shard_64
    , seq_id::bit(64) AS seq_64
    , id::bit(64) AS id_64

    -- "shard_id_conv" should equal "shard_id" (**and does not**, instead it's 1374088)
    -- note: '^' is changed to '#'
    -- shard_id_conv = 1374088
    , id # ((id >> 23) << 23) AS shard_id_conv 

    -- "ms_conv" should equal "ms" (and does)
    , epoch + (id >> 23) AS ms_conv

    -- "shard_seq" equals "shard_id_conv" (and does, but isn't the actual shard_id)
    -- shard_seq = 1374088
    , (shard_bit | seq_id) AS shard_seq
FROM val;
/* -- 64 BIT
0000101001100100101010010000110011010101100000000000000000000000 -- ms_bit
0000000000000000000000000000000000000000000101001111010000000000 -- shard_bit
0000000000000000000000000000000000000000000000000000001110001000 -- seq_bit
0000101001100100101010010000110011010101100101001111011110001000 -- id_bit
*/

Upvotes: 3

Views: 877

Answers (1)

thames
thames

Reputation: 6083

I think Instagram messed up the formula in the slides as we need to remove the 10 seq_id bits.

Please Note: # is the xor operator in Postgres. Instagram used ^ as the xor operator in their formula.

Incorrect:

id # ((id >> 23) << 23) AS shard_id

Correct:

(id # ((id >> 23) << 23)) >> 10 AS shard_id

The >> 10 removes the seq_id bits by doing a right bit shift.

If there is a better way to remove the 10 seq_id bits that performs better in Postgres, please answer.

Upvotes: 2

Related Questions