SpaceFace
SpaceFace

Reputation: 1552

Unique url from primary key

I'm trying to create a URL similar to youtube's /v=xxx in look and in behavior. In short, users will upload files and be able to access them through that URL. This URL code needs to be some form of the database's primary key so the page can gather the data needed. I'm new to databases and this is more of a database problem than anything.

In my database I have a auto increment primary key which file data is accessed by. I want to use that number to to create the URL for files. I started looking into different hash functions, but I'm worried about collisions. I don't want the same URL for two different files.

I also considered using uniqid() as my primary key CHAR(13), and just use that directly. But with this I'm worried about efficiency. Also looking around I can't seem to find much about it so it's probably a strange idea. Not to mention I would need to test for collisions when ids are generated which can be inefficient. Auto increment is a lot easier.

Is there any good solution to this? Will either of my ideas work? How can I generate a unique URL from an auto incremented primary key and avoid collisions?

I'm leaning toward my second idea, it won't be greatly efficient, but the largest performance drawbacks are caused when things need to be added to the database (testing for collisions), which for the end user, only happens once. The other performance drawback will probably be in the actual looking of of chars instead of ints. But I'm mainly worried that it's bad practice.

EDIT:

A simple solution would to be just to use the auto incremented value directly. Call me picky, but that looks kind of ugly.

Upvotes: 1

Views: 1053

Answers (4)

ChrisW
ChrisW

Reputation: 5068

I wanted to do something similar (but with articles, not uploaded documents), and came up with something a bit different:

  • take a prime number [y] (much) larger than the max number [n] of documents there will ever be (e.g. 25000 will be large enough for the total number of documents, and 1000099 is a much larger prime number than 25001)
  • for the current document id [x]: (x*y) modulus (n+1)
  • this will generate a number between 1 and n that is never duplicated

although the url may look like a traditional primary key, it does have the slight advantage that each subsequent document will have a id which is totally unrelated to the previous one; some people also argue that not including the primary key also has a very slight security advantage...

Upvotes: 0

Starx
Starx

Reputation: 78981

Generating non colliding short hash will indeed be a headache. So, instead the slug format of Stackoverflow is very promising and is guaranteed to produce non duplicate url.

For example, this very same question has

https://stackoverflow.com/questions/11991785/unique-url-from-primary-key

Here, it has unique primary key and also a title to make it more SE friendly.


However as commented, they are few previously asked question, that might clear out, why? what you are trying is better left out.

  1. How to generate a unique hash for a URL?
  2. Create Tinyurl style hash

Creating short hashes increases the chances a collision a lot, so better user base64 or sha512 functions to create a secured hash.

Upvotes: 1

O. Jones
O. Jones

Reputation: 108651

This is fairly straightforward if you're willing to use a random number to generate your short URL. For example, you can do this:

 SELECT BASE64_ENCODE(CAST(RAND()*1000000 AS UNSIGNED INTEGER)) AS tag

This is capable of giving you one million different tags. To get more possible tags, increase the value by which the RAND() number is multiplied. These tag values will be hard to predict.

To make sure you don't get duplicates you need to dedupe the tag values. That's easy enough to do but will require logic in your program. Insert the tag values into a table which uses them as a primary key. If your insert fails, try again, reinvoking RAND().

If you get close to your maximum number of tags you'll start having lots of insert failures (tag collisions).

BASE64_ENCODE comes from a stored function you need to install. You can find it here:

http://wi-fizzle.com/downloads/base64.sql

If you're using MySQL 5.6 or higher you can use the built-in TO_BASE64 function.

Upvotes: 0

Green Black
Green Black

Reputation: 5084

You can simply make a hash of the time, and afterwards check that hash (or part of that hash in your DB. If you set an index on that field in your DB (and make sure the hash is long enough to not make a lot of collisions), it won't be an issue at all time wise.

<?php

$hashChecked = false;

while( $hashChecked === false ){
  $hash = substr( sha1(time().mt_rand(9999,99999999)), 0, 8);  //varchar 8 (make sure that is enough with a very big margin)
  $q = mysql_query("SELECT `hash` FROM `tableName` WHERE `hash` = '".$hash."'");
  $hashChecked = mysql_num_rows() > 0 ? false : true;
}

mysql_query("INSERT INTO `tableName` SET `hash` = '".$hash."'");

Upvotes: 0

Related Questions