Reputation: 349
So I was wondering about youtube's url. Specially the video id watch?v=XZmGGAbHqa0
. Same for similar tinyurl services.
I came across Tom's video of Will YouTube Ever Run Out Of Video IDs? That was quite interesting. But generating a seemingly random number in a random way, check for duplicates looked quite expensive.
In case you haven't watched the video, youtube generates a large unique number and base64 encode it. Not sure if the whole process is that simple inside youtube but I'm trying to write something similar.
I came across a mathematical function called Modular Multiplicative Inverse.
Using this along with a bit of Extended Euclidean algorithm will generate a unique number from our given input. Inverse is possible too, But someone won't be easily able to inverse or brute force without knowing the seeds. So we can easily get a large random number from our subsequent id in database. Maybe even base64 it.
But I have confusions.
As tom mentioned in the video, syncing a subsequent id across multiple servers can lead to problems and overheads. But if we don't make it subsequent, won't it be a tedious and expensive task to find a unique id? We have to check if the id is available in db.
Should I avoid mixing in url with subsequent ids? Or masking it with something like that Multiplicative Inverse is good?
Why would I want to base64 it? To save storage or to save url space? I would still need to generate something seemingly random, unique and be able to quickly generate it without duplicating and search it.
Any better ways to achieve what I'm trying? I did checked all the similar so questions and a few blog posts. On of which is this
Sorry If I couldn't explain well, I'm quite confused what to ask.
Upvotes: 0
Views: 230
Reputation: 74655
This is why you bypass the issue of ensuring uniqueness of randomly generated numbers entirely and use a pseudorandom permutation (PRP) family to transform a pre-coordinated counter.
A PRP family is like a PRF family in that it is a cipher that takes a key and a plaintext and produces a ciphertext except that it has a one-to-one map between plaintexts and ciphertexts.
This lets one use a Twitter Snowflake like design and then simply encode the internal sequential identifiers into non-sequential external identifiers.
See Algorithm to turn numeric IDs in to short, different alphanumeric codes for an implementation in Python.
Also
M. Luby and C. Rackoff. How to construct pseudorandom permutations from pseudorandom functions. SIAM J. Comput., 17(2):373–386, Apr. 1988.
Upvotes: 1