trogne
trogne

Reputation: 3552

base 64 encoding in javascript

Below is a base 64 image encoding function that I got from Philippe Tenenhaus (http://www.philten.com/us-xmlhttprequest-image/).

It's very confusing to me, but I'd love to understand.

I think I understand the bitwise & and | , and moving through byte position with << and >>.

I'm especially confused at those lines : ((byte1 & 3) << 4) | (byte2 >> 4); ((byte2 & 15) << 2) | (byte3 >> 6);

And why it still using byte1 for enc2, and byte2 for enc3. And the purpose of enc4 = byte3 & 63; ...

Can someone could explain this function.

function base64Encode(inputStr) 
            {
               var b64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=";
               var outputStr = "";
               var i = 0;

               while (i < inputStr.length)
               {
                   //all three "& 0xff" added below are there to fix a known bug 
                   //with bytes returned by xhr.responseText
                   var byte1 = inputStr.charCodeAt(i++) & 0xff;
                   var byte2 = inputStr.charCodeAt(i++) & 0xff;
                   var byte3 = inputStr.charCodeAt(i++) & 0xff;

                   var enc1 = byte1 >> 2;
                   var enc2 = ((byte1 & 3) << 4) | (byte2 >> 4);

                   var enc3, enc4;
                   if (isNaN(byte2))
                   {
                       enc3 = enc4 = 64;
                   }
                   else
                   {
                       enc3 = ((byte2 & 15) << 2) | (byte3 >> 6);
                       if (isNaN(byte3))
                       {
                           enc4 = 64;
                       }
                       else
                       {
                           enc4 = byte3 & 63;
                       }
                   }

                   outputStr += b64.charAt(enc1) + b64.charAt(enc2) + b64.charAt(enc3) + b64.charAt(enc4);
                } 

                return outputStr;
            }

Upvotes: 0

Views: 239

Answers (1)

Christopher Gress
Christopher Gress

Reputation: 118

It probably helps to understand what Base64 encoding does. It converts 24 bits in groupings of 8 bits into groupings of 6 bits. (http://en.wikipedia.org/wiki/Base64)

So enc1, is the first 6-bits which are the first 6-bits of the first Byte.

enc2, is the next 6-bits, the last 2-bits of the first Byte and first 4-bits of the second Byte. The bitwise and operation byte1 & 3 targets the last 2 bits in the first Byte. So,

XXXXXXXX & 00000011 = 000000XX

It is then shifted to the left 4 bits.

000000XX << 4 = 00XX0000.

The byte2 >> 4 performs a right bit shift, isolating the first 4 bits of the second Byte, shown below

YYYYXXXX >> 4 = 0000YYYY

So, ((byte1 & 3) << 4) | (byte2 >> 4) combines the results with a bitwise or

00XX0000 | 0000YYYY = 00XXYYYY

enc3, is the last 4-bits of the second byte and the first 2-bits of the 3rd Byte.

enc4 is the last 6-bits of the 3rd Byte.

charCodeAt returns a Unicode code point which is a 16-bit value, so it appears there is an assumption that the relevant information is only in the low 8-bits. This assumption makes me wonder if there still is a bug in the code. There could be some information lost as a result of this assumption.

Upvotes: 1

Related Questions