cyberwombat
cyberwombat

Reputation: 40133

Why do base64/openssl use a padding character of 'K' instead of '='

Ive noticed that php base64_encode uses '=' as a padding character. According to Wikipedia the different types use either '=' or none. However the CLI base64 command as well as openssl enc -base64 use 'K' as the padding. I am looking for information as to why and what implementations they use.

echo base64_encode('hello'); // aGVsbG8=
echo hello | base64 -i - // aGVsbG8K
openssl enc -base64 <<< hello   // aGVsbG8K

Upvotes: 14

Views: 5466

Answers (1)

Marinos An
Marinos An

Reputation: 10868

K is not padding character. It is a result of the newline which is added by the shell commands.

echo hello | openssl enc -base64 # aGVsbG8K
echo -n hello | openssl enc -base64 # aGVsbG8=

UPDATE:

Technical explanation

Base64 converts the provided bitstream to 6-bit-chunks instead of 8-bit chunks. Then a special table (other than the ascii table) with 64 printable-only characters (thus the encoding name), is used to convert these 6-bit chunks to characters:

Let's see it in practice. (print-bits and print-b64-bits are imaginary commands )

With newline:

echo hello | print-bits

# 01101000 (h) 01100101 (e) 01101100 (l) 01101100 (l) 01101111 (o) 00001010 (\n)

echo hello | print-b64-bits

# 011010 (a) 000110 (G) 010101 (V) 101100 (s) 011011 (b) 000110 (G) 111100 (8) 001010 (K)


No newline:

echo -n hello | print-bits

# 01101000 (h) 01100101 (e) 01101100 (l) 01101100 (l) 01101111 (o)

echo -n hello | print-b64-bits

# 011010 (a) 000110 (G) 010101 (V) 101100 (s) 011011 (b) 000110 (G) 111100 (8)

In the latter case the output characters are 7. A = char needs to be appended to make them 8 (a product of 4).

Note: A newline at the end is not always converted to K. It could be o or g. This depends on the number of input bytes. Consider the case below:

echo helllo | print-bits

# 01101000 (h) 01100101 (e) 01101100 (l) 01101100 (l) 01101100 (l) 01101111 (o) 00001010 (\n)

echo helllo | print-b64-bits

# 011010 (a) 000110 (G) 010101 (V) 101100 (s) 011011 (b) 000110 (G) 110001 (x) 101111 (v) 000010 (C) 10 (g)

In the case above the last 2 bits will first be padded with zeros, then conversion to printable characters will follow. The last output character is now g.

And since the output characters are 10, two = need to be added to make them 12 (product of 4).

Upvotes: 29

Related Questions