Digerdoden
Digerdoden

Reputation: 301

How do I create unique IDs, like YouTube?

I've always wondered how and why they do this...an example: http://youtube.com/watch?v=DnAMjq0haic

How are these IDs generated such that there are no duplicates, and what advantage does this have over having a simple auto incrementing numeric ID?

How do one keep it short but still keep it's uniqueness? The string uniqid creates are pretty long.

Upvotes: 28

Views: 23879

Answers (14)

chmac
chmac

Reputation: 12655

Kevin van Zonneveld has written an excellent article including a PHP function to do exactly this. His approach is the best I've found while researching this topic.

His function is quite clever. It uses a fixed $index variable so problematic characters can be removed (vowels for instance, or to avoid O and 0 confusion). It also has an option to obfuscate ids so that they are not easily guessable.

Upvotes: 26

Dev Bhaskar
Dev Bhaskar

Reputation: 139

Here is a small function that generates unique key randomly each time. It has very fewer chances to repeat same unique ID.

function uniqueKey($limit = 10) {
    $characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
    $randstring = '';
    for ($i = 0; $i < $limit; $i++) {
        $randstring .= $characters[rand(0, strlen($characters))];
    }
    return $randstring;
}

source: generate random unique IDs like YouTube or TinyURL in PHP

Upvotes: 3

DMCS
DMCS

Reputation: 31870

Try this: http://php.net/manual/en/function.uniqid.php

uniqid — Generate a unique ID...

Gets a prefixed unique identifier based on the current time in microseconds.

Caution This function does not generate cryptographically secure values, and should not be used for cryptographic purposes. If you need a cryptographically secure value, consider using random_int(), random_bytes(), or openssl_random_pseudo_bytes() instead.

Warning This function does not guarantee uniqueness of return value. Since most systems adjust system clock by NTP or like, system time is changed constantly. Therefore, it is possible that this function does not return unique ID for the process/thread. Use more_entropy to increase likelihood of uniqueness...

Upvotes: 18

cocco
cocco

Reputation: 16706

This is NOT PHP but can be converted to php or as it's Javascript & so clinetside without the need to slow down the server.. it can be used as you post whatever needs a unique id to your php.

Here is a way to create unique ids limited to

9 007 199 254 740 992 unique id's

it always returns 9 charachters.

where iE2XnNGpF is 9 007 199 254 740 992

You can encode a long Number and then decode the 9char generated String and it returns the number.

basically this function uses the 62base index Math.log() and Math.Power to get the right index based on the number.. i would explain more about the function but ifound it some time ago and can't find the site anymore and it toke me very long time to get how this works... anyway i rewrote the function from 0.. and this one is 2-3 times faster than the one that i found. i looped through 10million checking if the number is the same as the enc dec process and it toke 33sec with this one and the other one 90sec.

var UID={
 ix:'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',
 enc:function(N){
  N<=9007199254740992||(alert('OMG no more uid\'s'));
  var M=Math,F=M.floor,L=M.log,P=M.pow,r='',I=UID.ix,l=I.length,i;
  for(i=F(L(N)/L(l));i>=0;i--){
   r+=I.substr((F(N/P(l,i))%l),1)
  };
  return UID.rev(new Array(10-r.length).join('a')+r)
 },
 dec:function(S){
  var S=UID.rev(S),r=0,i,l=S.length,I=UID.ix,j=I.length,P=Math.pow;
  for(i=0;i<=(l-1);i++){r+=I.indexOf(S.substr(i,1))*P(j,(l-1-i))};
  return r
 },
 rev:function(a){return a.split('').reverse().join('')}
};

As i wanted a 9 character string i also appended a's on the generated string which are 0's.

To encode a number you need to pass a Number and not a string.

var uniqueId=UID.enc(9007199254740992);

To decode the Number again you need to pass the 9char generated String

var id=UID.dec(uniqueId);

here are some numbers

console.log(UID.enc(9007199254740992))//9 biliardi o 9 milioni di miliardi
console.log(UID.enc(1)) //baaaaaaaa 
console.log(UID.enc(10)) //kaaaaaaaa 
console.log(UID.enc(100)) //Cbaaaaaaa 
console.log(UID.enc(1000)) //iqaaaaaaa 
console.log(UID.enc(10000)) //sBcaaaaaa 
console.log(UID.enc(100000)) //Ua0aaaaaa 
console.log(UID.enc(1000000)) //cjmeaaaaa
console.log(UID.enc(10000000)) //u2XFaaaaa
console.log(UID.enc(100000000)) //o9ALgaaaa 
console.log(UID.enc(1000000000)) //qGTFfbaaa
console.log(UID.enc(10000000000)) //AOYKUkaaa 
console.log(UID.enc(100000000000)) //OjO9jLbaa
console.log(UID.enc(1000000000000)) //eAfM7Braa 
console.log(UID.enc(10000000000000)) //EOTK1dQca
console.log(UID.enc(100000000000000)) //2ka938y2a

As you can see there are alot of a's and you don't want that... so just start with a high number. let's say you DB id is 1 .. just add 100000000000000 so that you have 100000000000001

and you unique id looks like youtube's id 3ka938y2a

i don't think it's easy to fulfill the other 8907199254740992 unique id's

Upvotes: 0

ivanakimov
ivanakimov

Reputation: 157

I had a similar issue - I had primary id's in the database, but I did not want to expose them to the user - it would've been much better to show some sort of a hash instead. So, I wrote hashids.

Documentation: http://www.hashids.org/php/

Souce: https://github.com/ivanakimov/hashids.php

Hashes created with this class are unique and decryptable. You can provide a custom salt value, so others cannot decrypt your hashes (not that it's a big problem, but still a "good-to-have").

To encrypt a number your would do this:

require('lib/Hashids/Hashids.php');

$hashids = new Hashids\Hashids('this is my salt');
$hash = $hashids->encrypt(123);

Your $hash would now be: YDx

You can also set minimum hash length as the second parameter to the constructor so your hashes can be longer. Or if you have a complex clustered system you could even encrypt several numbers into one hash:

$hash = $hashids->encrypt(2, 456); /* aXupK */

(for example, if you have a user in cluster 2 and an object with primary id 456) Decryption works the same way:

$numbers = $hashids->decrypt('aXupK');

$numbers would then be: [2, 456].

The good thing about this is you don't even have to store these hashes in the database. You could get the hash from url once request comes in and decrypt it on the fly - and then pull by primary id's from the database (which is obviously an advantage in speed).

Same with output - you could encrypt the id's on the way out, and display the hash to the user.

EDIT:

  1. Changed urls to include both doc website and code source
  2. Changed example code to adjust to the main lib updates (current PHP lib version is 0.3.0 - thanks to all the open-source community for improving the lib)

Upvotes: 8

Jason
Jason

Reputation: 2049

base62 or base64 encode your primary key's value then store it in another field.

example base62 for primary key 12443 = 3eH

saves some space, which is why im sure youtube is using it.

doing a base62(A-Za-z0-9) encode on your PK or unique identifier will prevent the overhead of having to check to see if the key already exists :)

Upvotes: 9

Dimitrios Mistriotis
Dimitrios Mistriotis

Reputation: 2798

A way to do it is by a hash function with unique input every time.

example (you've tagged the question with php therfore):

$uniqueID = null
do {
  $uniqueID = sha1( $fileName + date() );
} while ( !isUnique($uniqueID) )

Upvotes: 2

mpen
mpen

Reputation: 283043

If you want short URLs and predictability is not a concern, you can convert the auto-incrementing ID to a higher base.

Upvotes: 3

Martin Jansen
Martin Jansen

Reputation: 276

Results of hash functions like SHA-1 or MD5 and GUIDs tend to become very long, which is probably something you don't want. (You've specifically mentioned YouTube as an example: Their identifiers stay relatively short even with the bazillion videos they are hosting.)

This is why you might want to look into converting your numeric IDs, which you are using behind the scenes, into another base when putting them into URLs. Flickr e.g. uses Base58 for their canonical short URLs. Details about this are available here: http://www.flickr.com/groups/api/discuss/72157616713786392/. If you are looking for a generic solution, have a look at the PEAR package Mathe_Basex.

Please note that even in another base, the IDs can still be predicted from outside of your application.

Upvotes: 1

Edward Kmett
Edward Kmett

Reputation: 29982

Consider using something like:

$id = base64_encode(md5(uniqid(),true));

uniqid will get you a unique identifier. MD5 will diffuse it giving you a 128 bit result. Base 64 encoding that will give you 6 bits per character in an identifier suitable for use on the web, weighing in around 23 characters and computationally intractable to guess. If you want to be even more paranoid ugrade from md5 to sha1 or higher.

Upvotes: 2

Frank V
Frank V

Reputation: 25429

I don't have a formula but we do this on a project that I'm on. (I can't share it). But we basically generate one character at a time and append the string.

Once we have a completed string, we check it against the database. If there is no other, we go with it. If it is a duplicate, we start the process over. Not very complicated.

The advantage is, I guess that of a GUID.

Upvotes: 0

n8wrl
n8wrl

Reputation: 19765

So much of this depends on what you need to do. How 'unique' is unique? Are you serving up the unique ID's, and do they mean something in your DB? if so, a sequential # might be ok.

ON the other hand, if you use sequential #'s someone could systematically steal your content by iterating thru the numbers.

There are filesystem commands that will generate unique file names - you could use those.

Or GUID's.

Upvotes: 1

User
User

Reputation: 30985

There should be a library for PHP to generate these IDs. If not, it's not difficult to implement it.

The advantage is that later you won't have name conflicts, when you try to reorganize or merge different server resources. With numeric ids you would have to change some of them to resolve conflicts and that will result in Url change leading to SEO hit.

Upvotes: 1

Sampson
Sampson

Reputation: 268414

Auto-incrementing can easily be crawled. These cannot be predicted, and therefore cannot be sequentially crawled.

I suggest going with a double-url format (Similar to the SO URLs):

yoursite.com/video_idkey/url_friendly_video_title

If you required both the id, and the title in the url, you could then use simple numbers like 0001, 0002, 0003, etc.

Generating these keys can be really simple. You could use the uniqid() function in PHP to generate 13 chars, or 23 with more entropy.

Upvotes: 4

Related Questions