Reputation: 3552
Below is a base 64 image encoding function that I got from Philippe Tenenhaus (http://www.philten.com/us-xmlhttprequest-image/).
It's very confusing to me, but I'd love to understand.
I think I understand the bitwise & and | , and moving through byte position with << and >>.
I'm especially confused at those lines : ((byte1 & 3) << 4) | (byte2 >> 4); ((byte2 & 15) << 2) | (byte3 >> 6);
And why it still using byte1 for enc2, and byte2 for enc3.
And the purpose of enc4 = byte3 & 63;
...
Can someone could explain this function.
function base64Encode(inputStr)
{
var b64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=";
var outputStr = "";
var i = 0;
while (i < inputStr.length)
{
//all three "& 0xff" added below are there to fix a known bug
//with bytes returned by xhr.responseText
var byte1 = inputStr.charCodeAt(i++) & 0xff;
var byte2 = inputStr.charCodeAt(i++) & 0xff;
var byte3 = inputStr.charCodeAt(i++) & 0xff;
var enc1 = byte1 >> 2;
var enc2 = ((byte1 & 3) << 4) | (byte2 >> 4);
var enc3, enc4;
if (isNaN(byte2))
{
enc3 = enc4 = 64;
}
else
{
enc3 = ((byte2 & 15) << 2) | (byte3 >> 6);
if (isNaN(byte3))
{
enc4 = 64;
}
else
{
enc4 = byte3 & 63;
}
}
outputStr += b64.charAt(enc1) + b64.charAt(enc2) + b64.charAt(enc3) + b64.charAt(enc4);
}
return outputStr;
}
Upvotes: 0
Views: 239
Reputation: 118
It probably helps to understand what Base64 encoding does. It converts 24 bits in groupings of 8 bits into groupings of 6 bits. (http://en.wikipedia.org/wiki/Base64)
So enc1, is the first 6-bits which are the first 6-bits of the first Byte.
enc2, is the next 6-bits, the last 2-bits of the first Byte and first 4-bits of the second Byte. The bitwise and operation byte1 & 3 targets the last 2 bits in the first Byte. So,
XXXXXXXX & 00000011 = 000000XX
It is then shifted to the left 4 bits.
000000XX << 4 = 00XX0000.
The byte2 >> 4 performs a right bit shift, isolating the first 4 bits of the second Byte, shown below
YYYYXXXX >> 4 = 0000YYYY
So, ((byte1 & 3) << 4) | (byte2 >> 4) combines the results with a bitwise or
00XX0000 | 0000YYYY = 00XXYYYY
enc3, is the last 4-bits of the second byte and the first 2-bits of the 3rd Byte.
enc4 is the last 6-bits of the 3rd Byte.
charCodeAt returns a Unicode code point which is a 16-bit value, so it appears there is an assumption that the relevant information is only in the low 8-bits. This assumption makes me wonder if there still is a bug in the code. There could be some information lost as a result of this assumption.
Upvotes: 1