Reputation: 36937
I know creating a short url algorithm isn't as easy as hashing a URL then chopping the hash down to some incremental version of itself. Even though from an outside perspective that is what it looks to be happening. I've read a few articles on the idea, seen a couple in action as well. But none seem to worry about future proofing it.
So I am here trying to find out how I can approach this with PHP and find ways that I can avoid at the least common problems. From database conflicts to whatever else there may be to worry about other than overall storage and database size.
One problem I will definitely face is the service I am creating is taking user-side URL from another service my buddy is creating so on a per user basis we are tracking there short URLs so its possible multiple users could end up using the same exact long url but we will need a different short url id for each user who is supplying a URL. Think of several users sharing a youtube video that recently went viral..
So whats the best tactic at creating a short url algorithm that wont face many bash's at the same time will allow me to query my DB with a handful of possible short URLs to see if they already exist or not.
Better yet is there some means I can create unique id's via mySQL functionality, that would in concept loop til one is unique and thusly created for the cause?
I know Im pulling at straws here and this is a rather open question. But I am trying to think tactfully before getting heavy into the build process to only later find out I messed up big. I kinda need some input prior to make sure I am taking a semi sane approach to this.
Upvotes: 2
Views: 447
Reputation: 88
Here's the
function shorturl($input) { ... }
function from the above SNIPPET IT page, translated from PHP to C#:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Security.Cryptography;
public static List<string> shorturl(string input) {
var md5 = MD5.Create();
var base32 = new char[] {
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h',
'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p',
'q', 'r', 's', 't', 'u', 'v', 'w', 'x',
'y', 'z', '0', '1', '2', '3', '4', '5'
};
var hex = string.Join("", md5.ComputeHash(Encoding.ASCII.GetBytes(input)).Select(a => a.ToString("x2")));
var hexLen = hex.Length;
var subHexLen = hexLen / 8;
var output = new List<string>();
for (var i = 0; i < subHexLen; i++) {
var subHex = hex.Substring(i * 8, 8);
var @int = 0x3FFFFFFF & Convert.ToUInt32("0x" + subHex, 16);
var @out = "";
for (var j = 0; j < 6; j++) {
var val = 0x0000001F & @int;
@out += base32[val];
@int = @int >> 5;
}
output.Add(@out);
}
return output;
}
Upvotes: 0
Reputation: 2757
You can use this short URL algorithm made in PHP - it generates four different "hashes" of the same url.
Create a table like
id | original_url | short_url
------------------------------------------
1 http://www.google.com/ tm5kxb
When user inputs an URL to shorten, you use the function from the article and receive an array of four different hashes. Then you can use a query like:
SELECT id FROM {your_table} WHERE short_url = "{a_hash_from_the_function}"
If the query returns no results, then it means that there was no match and you can use this one. If the query returns a result, simply use another hash from the array, see if it exists, and so forth.
Read the whole article as in the bottom the author explains how to make your hashes more unpredictable. I would suggest using a different hashing algorithm than md5()
, but you will have to experiment yourself. :)
Upvotes: 1
Reputation: 48121
Let's say you have a table urlShortened
id | url
-----------------
1 http://ecc
Both field are INDEX
and UNIQUE
in your database, so if you need to know an url already exists just make a select:
SELECT id FROM urlShortened WHERE url = 'http://anUrl'
This will prevent to insert duplicated urls too
If you need to have unique urls per user just add another field (userId
) and make an unique index on both fields (url,userId
)
id | url | userId
-----------------------------
1 http://site1 1
2 http://site1 2
Upvotes: 0