chris
chris

Reputation: 36937

Upon Creating a Short URL algorithm that stores in a DB what are potential problems to work around

I know creating a short url algorithm isn't as easy as hashing a URL then chopping the hash down to some incremental version of itself. Even though from an outside perspective that is what it looks to be happening. I've read a few articles on the idea, seen a couple in action as well. But none seem to worry about future proofing it.

So I am here trying to find out how I can approach this with PHP and find ways that I can avoid at the least common problems. From database conflicts to whatever else there may be to worry about other than overall storage and database size.

One problem I will definitely face is the service I am creating is taking user-side URL from another service my buddy is creating so on a per user basis we are tracking there short URLs so its possible multiple users could end up using the same exact long url but we will need a different short url id for each user who is supplying a URL. Think of several users sharing a youtube video that recently went viral..

So whats the best tactic at creating a short url algorithm that wont face many bash's at the same time will allow me to query my DB with a handful of possible short URLs to see if they already exist or not.

Better yet is there some means I can create unique id's via mySQL functionality, that would in concept loop til one is unique and thusly created for the cause?

I know Im pulling at straws here and this is a rather open question. But I am trying to think tactfully before getting heavy into the build process to only later find out I messed up big. I kinda need some input prior to make sure I am taking a semi sane approach to this.

Upvotes: 2

Views: 447

Answers (3)

Tim
Tim

Reputation: 88

Here's the

function shorturl($input) { ... }

function from the above SNIPPET IT page, translated from PHP to C#:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Security.Cryptography;

public static List<string> shorturl(string input) {
    var md5 = MD5.Create();
    var base32 = new char[] {
        'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h',
        'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p',
        'q', 'r', 's', 't', 'u', 'v', 'w', 'x',
        'y', 'z', '0', '1', '2', '3', '4', '5'
    };

    var hex = string.Join("", md5.ComputeHash(Encoding.ASCII.GetBytes(input)).Select(a => a.ToString("x2")));
    var hexLen = hex.Length;
    var subHexLen = hexLen / 8;
    var output = new List<string>();

    for (var i = 0; i < subHexLen; i++) {
        var subHex = hex.Substring(i * 8, 8);
        var @int = 0x3FFFFFFF & Convert.ToUInt32("0x" + subHex, 16);
        var @out = "";
            for (var j = 0; j < 6; j++) {
            var val = 0x0000001F & @int;
            @out += base32[val];
            @int = @int >> 5;
        }

        output.Add(@out);
    }

    return output;
}

Upvotes: 0

Pateman
Pateman

Reputation: 2757

You can use this short URL algorithm made in PHP - it generates four different "hashes" of the same url.

Create a table like

id |    original_url        |   short_url
------------------------------------------
1    http://www.google.com/     tm5kxb

When user inputs an URL to shorten, you use the function from the article and receive an array of four different hashes. Then you can use a query like:

SELECT id FROM {your_table} WHERE short_url = "{a_hash_from_the_function}"

If the query returns no results, then it means that there was no match and you can use this one. If the query returns a result, simply use another hash from the array, see if it exists, and so forth.

Read the whole article as in the bottom the author explains how to make your hashes more unpredictable. I would suggest using a different hashing algorithm than md5(), but you will have to experiment yourself. :)

Upvotes: 1

dynamic
dynamic

Reputation: 48121

Let's say you have a table urlShortened

id  | url
-----------------
1     http://ecc

Both field are INDEX and UNIQUE in your database, so if you need to know an url already exists just make a select:

SELECT id FROM urlShortened WHERE url  = 'http://anUrl'

This will prevent to insert duplicated urls too

If you need to have unique urls per user just add another field (userId) and make an unique index on both fields (url,userId)

id  | url           | userId
-----------------------------
1     http://site1    1
2     http://site1    2

Upvotes: 0

Related Questions