Alberto Zaccagni
Alberto Zaccagni

Reputation: 31560

Base10 to base64 url shortening

I'm coding an url shortener function for a project in which I'm learning php, here is the code (btw I suppose that global here is not a good thing to do :P):

$alphabet = array(1 => "a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z",
                "A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z",
                "0","1","2","3","4","5","6","7","8","9","_","-");

function shorten($id){
    global $alphabet;
    $shortenedId = "";
    while($id>0){
        $remainder = $id % 64;
        $id = $id / 64;     
        $shortenedId = $alphabet[$remainder].$shortenedId;
    }
    return $shortenedId;
}

The code is taken from this Wikipedia article and adapted to php. My problem is that when I pass a multiple of 64 to the function I get a wrong (for my purpose) result, for instance 128 returns b which is not correct, it should have been aaa, but that's too long for a 3-digit number.

Also I'm starting to think that there's something wrong in this code, if I pass 1'000'000'000'000 as $id I get nItOq... I feel it's wrong because a url shortening service like bit.ly returns a 6 number id if I use it, and I don't think that this algorithm is better than theirs.

So, two questions:

Upvotes: 5

Views: 11297

Answers (8)

Alan Davison
Alan Davison

Reputation: 3

This is a variation of Nathans code to handle large integers greater than PHP_INT_MAX.

This uses the BC Maths Functions that should be built-in on Windows servers, but this needs to be enabled as an optional extension on Unix servers. This solution also requires a couple of custom BC functions to handle floor and round functions that I copied from the post by Alix Axel.

function shorten($value, $alphabet = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-') {
    $base = strlen($alphabet);
    $result = '';
    while ($value) {
        $mod = bcmod($value, $base);
        $value = bcfloor(bcdiv($value, $base));
        $result = $alphabet[$mod] . $result;
    }
    return $result;
  }

function lengthen($value, $alphabet = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-') {
    $base= strlen($alphabet);
    $result = '';
    for($i = 0, $limit = strlen($value); $i < $limit; $i++) {
        $result = bcadd(bcmul($base, $result), strpos($alphabet, $value[$i]));
    }
    return $result;
}

function bcceil($number) {
    if (strpos($number, '.') !== false) {
        if (preg_match("~\.[0]+$~", $number)) return bcround($number, 0);
        if ($number[0] != '-') return bcadd($number, 1, 0);
        return bcsub($number, 0, 0);
    }
    return $number;
}

function bcfloor($number) {
    if (strpos($number, '.') !== false) {
        if (preg_match("~\.[0]+$~", $number)) return bcround($number, 0);
        if ($number[0] != '-') return bcadd($number, 0, 0);
        return bcsub($number, 1, 0);
    }
    return $number;
}

function bcround($number, $precision = 0) {
    if (strpos($number, '.') !== false) {
        if ($number[0] != '-') return bcadd($number, '0.' . str_repeat('0', $precision) . '5', $precision);
        return bcsub($number, '0.' . str_repeat('0', $precision) . '5', $precision);
    }
    return $number;
}

Examples running PHP 5.6 on Windows (32 bit)

foreach ([0, 1, 9, 10, 115617, bcsub(PHP_INT_MAX, 1), PHP_INT_MAX, bcadd(PHP_INT_MAX, 1234567890)] as $value) {
    $short = shorten($value);
    $reversed = lengthen($short);
    print shorten($value) . " ($value)<br>";
    if ("$value" !== $reversed) {
        print 'ERROR REVERSING VALUE<br>';
    }
}

Outputs

0 (0)
1 (1)
9 (9)
a (10)
sex (115617)
1----_ (2147483646)
1----- (2147483647)
39Bwbh (3382051537)

If the ID is public, avoid using vowels in the string (115617 is shortened to sex for example). This would be the base 54 version that should provide safe words.

$alphabet = '0123456789bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ_-';

Upvotes: 0

malhal
malhal

Reputation: 30582

How about this:

function shorten_int($id){
    $hex = base_convert(id, 10, 16);
    $base64 = base64_encode(pack('H*', $hex));
    //$base64 = str_replace("/", "_", $base64); // remove unsafe url chars
    //$base64 = str_replace("+", "-", $base64);
    //$base64 = rtrim($base64, '='); // Remove the padding "=="
    $replacePairs = array('/' => '_',
                          '+' => '-',
                          '=' => '');
    $base64 = strtr($base64, $replacePairs); // optimisation
    return $base64;
}

Upvotes: 2

diyism
diyism

Reputation: 12935

These two functions are very convenient, thanks to @malhal:

function shorten_int($id)
{
    $id=dechex($id);
    $id=strlen($id)%2===0?hex2bin($id):hex2bin('0'.$id);
    $id=base64_encode($id);
    $id=strtr($id, array('/'=>'_', '+'=>'-', '='=>''));
    return $id;
}

function unshorten_int($id)
{
    $id=strtr($id, array('-'=>'+', '_'=>'/'));
    $id=base64_decode($id);
    $id=bin2hex($id);
    return base_convert($id, 16, 10);
}

echo shorten_int(43121111)."\n";
echo unshorten_int(shorten_int(43121111))."\n";

Upvotes: 1

Inkeliz
Inkeliz

Reputation: 1091

You can use the pack.

$int = 1129717211140920362;

$byte = pack('J*', $int);    
echo base64_encode($byte); //= D62P0WqzFCo=

It will result in D62P0WqzFCo=, it is correct, because the $int is an int64 and uses 64 bits. The Base64 uses 6 bits for each character, so they need ~11 characters.

To decode use:

$base64 = 'D62P0WqzFCo=';

$byte = base64_decode($base64);
echo unpack('J*',  $byte)[1]; //= 1129717211140920362

It will return 1129717211140920362. ;)


It was based in the answer on the Stackoverflow in Portuguese.

Upvotes: -1

rgbflawed
rgbflawed

Reputation: 2137

In case you're looking for the opposite function to take a base64 number and convert to base10, here's some PHP based off of the JavaScript in this answer: How to convert base64 to base10 in PHP?

function lengthen($id) {
    $alphabet='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-';

    $number=0;
    foreach(str_split($id) as $letter) {
        $number=($number*64) + strpos($alphabet,$letter);
    }
    return $number;
}

Upvotes: 5

Gabe Sumner
Gabe Sumner

Reputation: 4998

Paul Greg created some PHP code that converts from Base-10 to another base. This can be tested and the code downloaded here:

http://www.pgregg.com/projects/php/base_conversion/base_conversion.php

I'm using this approach to convert the database row id's to Base-64. Once these numbers have been shortened they can be used in the URL. [details]

Upvotes: 1

nathan
nathan

Reputation: 5452

Just a couple of little tweaks needed, the main two were to make the the alphabet zero indexed rather than one-indexed, and to subtract the remainder from the id before dividing

function shorten($id)
{
    $alphabet = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-';
    $shortenedId = '';
    while($id>0) {
        $remainder = $id % 64;
        $id = ($id-$remainder) / 64;     
        $shortenedId = $alphabet{$remainder} . $shortenedId;
    };
    return $shortenedId;
}

and here's a further modified version which... well I just like

function shorten($id, $alphabet='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-')
{
    $base = strlen($alphabet);
    $short = '';
    while($id) {
        $id = ($id-($r=$id%$base))/$base;     
        $short = $alphabet{$r} . $short;
    };
    return $short;
}

EDIT: sorted concatenation to be the same as the OPs

Upvotes: 14

Richard Knop
Richard Knop

Reputation: 83705

By the way, check out the base_convert() function (http://php.net/manual/en/function.base-convert.php):

echo base_convert(1000000000, 10, 36);

36 is the longest base it can convert to, though. But in the comments section I found this:

function dec2any( $num, $base, $index=false ) {
    if (! $base ) {
        $base = strlen( $index );
    } else if (! $index ) {
        $index = substr( "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" ,0 ,$base );
    }
    $out = "";
    for ( $t = floor( log10( $num ) / log10( $base ) ); $t >= 0; $t-- ) {
        $a = floor( $num / pow( $base, $t ) );
        $out = $out . substr( $index, $a, 1 );
        $num = $num - ( $a * pow( $base, $t ) );
    }
    return $out;
}

echo dec2any(1000000000, 64, "_-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789");

Maybe it will help?

Upvotes: 1

Related Questions