Roman
Roman

Reputation: 3749

PHP base_convert for shortening URLs

I want to make my urls shorter, similar as tinyurl, or any other url shortning service. I have following type of links:

localhost/test/link.php?id=1000001
localhost/test/link.php?id=1000002

etc.

The ID in the above links are auto-incrementing ID's of the rows from db. The above links are mapped like:

localhost/test/1000001
localhost/test/1000002

Now instead of using the above long IDs, I would like to shorten them. I have found that I can use base_convert() function. For example:

print base_convert(100000000, 10, 36);

//output will be "1njchs"

It looks pretty good, but i want to ask if there is any disadvantage(eg. slow performance or any other) of using this function or is there any better approach to do same thing (eg. make own function to generate random ID strings)?

Thanks.

Upvotes: 6

Views: 2031

Answers (3)

Alex Barker
Alex Barker

Reputation: 4400

Unfortunately, I was unsatisfied with the answers here and elsewhere as base_convert() and other floating point based conversion strategies lose an unacceptable amount of precision for cryptographic purposes. Furthermore, most of these implementations are incapable of dealing with numbers large enough for cryptographic application. The following provides two methods of base conversion that should be safe for large bases'. For example, converting a base256 (binary string) to base85 representation and back again.

Using GMP

You can use GMP to accomplish this at the cost of converting bin<->hex two unneeded times as well as being limited to base62.

<?php
// Not bits, bytes.
$data = openssl_random_pseudo_bytes(256);

$base62 = gmp_strval( gmp_init( bin2hex($data), 16), 62 );
$decoded = hex2bin( gmp_strval( gmp_init($base62, 62), 16 ));

var_dump( strcmp($decoded, $data) === 0 ); // true

Pure PHP

If you would like to move beyond base62 to base85 or a slight performance improvement, you will need something like the following.

<?php

/**
* Divide a large number represented as a binary string in the specified base
* and return the remainder.
* 
* @param string &$binary
* @param int $base
* @param int $start
*/
function divmod(&$binary, $base, $divisor, $start = 0)
{
    /** @var int $size */
    $size = strlen($binary);

    // Do long division from most to least significant byte, keep remainder.
    $remainder = 0;
    for ($i = $start; $i < $size; $i++) {
        // Get the byte value, 0-255 inclusive.
        $digit = ord($binary[$i]);

        // Shift the remainder left by base N bits, append the last byte.
        $temp = ($remainder * $base) + $digit;

        // Calculate the value for the current byte.
        $binary[$i] = chr($temp / $divisor);

        // Carry the remainder to the next byte.
        $remainder = $temp % $divisor;
    }

    return $remainder;
}

/**
* Produce a base62 encoded string from a large binary number.
* 
* @param string $binary
* return string
*/
function encodeBase62($binary)
{
    $charMap = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    $base = strlen($charMap);

    $size = strlen($binary);
    $start = $size - strlen(ltrim($binary, "\0"));

    $encoded = "";
    for ($i = $start; $i < $size; ) {
        // Do long division from most to least significant byte, keep remainder.
        $idx = divmod($binary, 256, $base, $i);

        $encoded = $charMap[$idx] . $encoded;

        if (ord($binary[$i]) == 0) {
            $i++; // Skip leading zeros produced by the long division.
        }
    }

    $encoded = str_pad($encoded, $start, "0", STR_PAD_LEFT);

    return $encoded;
}

/**
* Produce a large binary number from a base62 encoded string.
* 
* @param string $ascii
* return string
*/
function decodeBase62($ascii)
{
    $charMap = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    $base = strlen($charMap);

    $size = strlen($ascii);
    $start = $size - strlen(ltrim($ascii, "0"));

    // Convert the ascii representation to binary string.
    $binary = "";
    for ($i = $start; $i < $size; $i++) {
        $byte = strpos($charMap, $ascii[$i]);
        if ($byte === false) {
            throw new OutOfBoundsException("Invlaid encoding at offset '{$ascii[$i]}'");
        }

        $binary .= chr($byte);
    }

    $decode = "";
    for ($i = 0; $i < $size; ) {
        // Do long division from most to least significant byte, keep remainder.
        $idx = divmod($binary, $base, 256, $i);

        $decode = chr($idx) . $decode;

        if (ord($binary[$i]) == 0) {
            $i++; // Skip leading zeros produced by the long division.
        }
    }

    $decode = ltrim($decode, "\0");
    $decode = str_pad($decode, $start, "\0", STR_PAD_LEFT);

    return $decode;
}

// Not bits, bytes.
$data = openssl_random_pseudo_bytes(256);

$base62 = encodeBase62($data);
$decoded = decodeBase62($base62);

var_dump( strcmp($decoded, $data) === 0 ); // true

Upvotes: 0

Sici
Sici

Reputation: 41

With base_convert() you can convert the string to a shorter code and then with intval() you create a ID to store item in database

My code snippet:-

$code = base_convert("long string", 10, 36);
$ID= intval($code ,36); 

Upvotes: 3

Gedrox
Gedrox

Reputation: 3612

The function base_convert is fast enough, but if you want to do better, just store the encoded string inside the database.

Upvotes: 4

Related Questions