Reputation: 40133
Ive noticed that php base64_encode
uses '=' as a padding character. According to Wikipedia the different types use either '=' or none. However the CLI base64
command as well as openssl enc -base64
use 'K' as the padding. I am looking for information as to why and what implementations they use.
echo base64_encode('hello'); // aGVsbG8=
echo hello | base64 -i - // aGVsbG8K
openssl enc -base64 <<< hello // aGVsbG8K
Upvotes: 14
Views: 5466
Reputation: 10868
K
is not padding character. It is a result of the newline which is added by the shell commands.
echo hello | openssl enc -base64 # aGVsbG8K
echo -n hello | openssl enc -base64 # aGVsbG8=
UPDATE:
Base64 converts the provided bitstream to 6-bit-chunks instead of 8-bit chunks. Then a special table (other than the ascii table) with 64 printable-only characters (thus the encoding name), is used to convert these 6-bit chunks to characters:
Let's see it in practice. (print-bits
and print-b64-bits
are imaginary commands )
With newline:
echo hello | print-bits
# 01101000 (h) 01100101 (e) 01101100 (l) 01101100 (l) 01101111 (o) 00001010 (\n)
echo hello | print-b64-bits
# 011010 (a) 000110 (G) 010101 (V) 101100 (s) 011011 (b) 000110 (G) 111100 (8) 001010 (K)
No newline:
echo -n hello | print-bits
# 01101000 (h) 01100101 (e) 01101100 (l) 01101100 (l) 01101111 (o)
echo -n hello | print-b64-bits
# 011010 (a) 000110 (G) 010101 (V) 101100 (s) 011011 (b) 000110 (G) 111100 (8)
In the latter case the output characters are 7. A =
char needs to be appended to make them 8 (a product of 4).
Note: A newline at the end is not always converted to
K
. It could beo
org
. This depends on the number of input bytes. Consider the case below:
echo helllo | print-bits
# 01101000 (h) 01100101 (e) 01101100 (l) 01101100 (l) 01101100 (l) 01101111 (o) 00001010 (\n)
echo helllo | print-b64-bits
# 011010 (a) 000110 (G) 010101 (V) 101100 (s) 011011 (b) 000110 (G) 110001 (x) 101111 (v) 000010 (C) 10 (g)
In the case above the last 2 bits will first be padded with zeros, then conversion to printable characters will follow. The last output character is now g
.
And since the output characters are 10, two =
need to be added to make them 12 (product of 4).
Upvotes: 29