Drew Stephens
Drew Stephens

Reputation: 17827

What character replacements should be performed to make base 64 encoding URL safe?

In looking at URL safe base 64 encoding, I've found it to be a very non-standard thing. Despite the copious number of built in functions that PHP has, there isn't one for URL safe base 64 encoding. On the manual page for base64_encode(), most of the comments suggest using that function, wrapped with strtr():

function base64_url_encode($input)
{
     return strtr(base64_encode($input), '+/=', '-_,');
}

The only Perl module I could find in this area is MIME::Base64::URLSafe (source), which performs the following replacement internally:

sub encode ($) {
    my $data = encode_base64($_[0], '');
    $data =~ tr|+/=|\-_|d;
    return $data;
}

Unlike the PHP function above, this Perl version drops the '=' (equals) character entirely, rather than replacing it with ',' (comma) as PHP does. Equals is a padding character, so the Perl module replaces them as needed upon decode, but this difference makes the two implementations incompatible.

Finally, the Python function urlsafe_b64encode(s) keeps the '=' padding around, prompting someone to put up this function to remove the padding which shows prominently in Google results for 'python base64 url safe':

from base64 import urlsafe_b64encode, urlsafe_b64decode

def uri_b64encode(s):
    return urlsafe_b64encode(s).strip('=')

def uri_b64decode(s):
    return urlsafe_b64decode(s + '=' * (4 - len(s) % 4))

The desire here is to have a string that can be included in a URL without further encoding, hence the ditching or translation of the characters '+', '/', and '='. Since there isn't a defined standard, what is the right way?

Upvotes: 6

Views: 4855

Answers (5)

ZZ Coder
ZZ Coder

Reputation: 75456

I don't think there is right or wrong. But most popular encoding is

'+/=' => '-_.'

This is widely used by Google, Yahoo (they call it Y64). The most url-safe version of encoders I used on Java, Ruby supports this character set.

Upvotes: 9

Grant Wagner
Grant Wagner

Reputation: 25931

There does appear to be a standard, it is RFC 3548, Section 4, Base 64 Encoding with URL and Filename Safe Alphabet:

This encoding is technically identical to the previous one, except for the 62:nd and 63:rd alphabet character, as indicated in table 2.

+ and / should be replaced by - (minus) and _ (understrike) respectively. Any incompatible libraries should be wrapped so they conform to RFC 3548.

Note that this requires that you URL encode the (pad) = characters, but I prefer that over URL encoding the + and / characters from the standard base64 alphabet.

Upvotes: 11

Ateş Göral
Ateş Göral

Reputation: 140050

If you're asking about the correct way, I'd go with proper URL-encoding as opposed to arbitrary replacement of characters. First base64-encode your data, then further encode special characters like "=" with proper URL-encoding (i.e. %<code>).

Upvotes: 1

Jon Benedicto
Jon Benedicto

Reputation: 10582

I'd suggest running the output of base64_encode through urlencode. For example:

function base64_encode_url( $str )
{
    return urlencode( base64_encode( $str ) );
}

Upvotes: 2

Fragsworth
Fragsworth

Reputation: 35497

Why don't you try wrapping it in a urlencode()? Documentation here.

Upvotes: 0

Related Questions