Reputation: 1

C++ Base64 Unicode - null bytes

I am trying to base64 encode a unicode string. I am running into problems, after the encoding, the output is my string base64'ed however, there is null bytes at random places in throughout the code, I don't know why, or how to get them out.

Here is my Base64Encode function:

static char Base64Digits[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
int Base64Encode(const BYTE* pSrc, int nLenSrc, wchar_t* pDst, int nLenDst)
{
   int nLenOut= 0;
   while ( nLenSrc > 0 ) {
  if (nLenOut+4 > nLenDst) return(0); // error

  // read three source bytes (24 bits) 
  BYTE s1= pSrc[0];   // (but avoid reading past the end)
  BYTE s2= 0; if (nLenSrc>1) s2=pSrc[1]; //------ corrected, thanks to  jprichey
  BYTE s3= 0; if (nLenSrc>2) s3=pSrc[2];

  DWORD n;
  n =  s1;    // xxx1
  n <<= 8;    // xx1x
  n |= s2;    // xx12  
  n <<= 8;    // x12x
  n |= s3;    // x123  

  //-------------- get four 6-bit values for lookups
  BYTE m4= n & 0x3f;  n >>= 6;
  BYTE m3= n & 0x3f;  n >>= 6;
  BYTE m2= n & 0x3f;  n >>= 6;
  BYTE m1= n & 0x3f;  

  //------------------ lookup the right digits for output
  BYTE b1 = Base64Digits[m1];
  BYTE b2 = Base64Digits[m2];
  BYTE b3 = Base64Digits[m3];
  BYTE b4 = Base64Digits[m4];

  //--------- end of input handling
  *pDst++ = b1;
  *pDst++ = b2;
  if ( nLenSrc >= 3 ) {  // 24 src bits left to encode, output xxxx
     *pDst++ = b3;
     *pDst++ = b4;
  }
  if ( nLenSrc == 2 ) {  // 16 src bits left to encode, output xxx=
     *pDst++ = b3;
     *pDst++ = '=';
     }
  if ( nLenSrc == 1 ) {  // 8 src bits left to encode, output xx==
     *pDst++ = '=';
     *pDst++ = '=';
  }
  pSrc    += 3;
  nLenSrc -= 3;
  nLenOut += 4;
 }
 // Could optionally append a NULL byte like so:
 // *pDst++= 0; nLenOut++;
 return( nLenOut );  
}

Not to fool anyone, but I copied the function from here

Here is how I call the function:

wchar_t base64[256];

Base64Encode((const unsigned char *)UserLoginHash, lstrlenW(UserLoginHash) * 2, base64, 256);

So, why is there random null-bytes or "whitespaces" in the generated hash? What should be changed so that I can get rid of them?

Upvotes: 0

Answers (2)

Remy Lebeau

Reputation: 597245

Try something more like this. Portions copied from my own base64 encoder:

static const wchar_t *Base64Digits = L"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

int Base64Encode(const BYTE* pSrc, int nLenSrc, wchar_t* pDst, int nLenDst)
{
    int nLenOut = 0;

    while (nLenSrc > 0) {
        if (nLenDst < 4) return(0); // error

        // read up to three source bytes (24 bits) 
        int len = 0;
        BYTE s1 = pSrc[len++];
        BYTE s2 = (nLenSrc > 1) ? pSrc[len++] : 0
        BYTE s3 = (nLenSrc > 2) ? pSrc[len++] : 0;
        pSrc += len;
        nLenSrc -= len;

        //------------------ lookup the right digits for output
        pDst[0] = Base64Digits[(s1 >> 2) & 0x3F];
        pDst[1] = Base64Digits[(((s1 & 0x3) << 4) | ((s2 >> 4) & 0xF)) & 0x3F];
        pDst[2] = Base64Digits[(((s2 & 0xF) << 2) | ((s3 >> 6) & 0x3)) & 0x3F];
        pDst[3] = Base64Digits[s3 & 0x3F];

        //--------- end of input handling
        if (len < 3) {  // less than 24 src bits encoded, pad with '='
          pDst[3] = L'=';
          if (len == 1)
            pDst[2] = L'=';
        }

        nLenOut += 4;
        pDst += 4;
        nLenDst -= 4;
    }

    if (nLenDst > 0) *pDst = 0;

    return (nLenOut);
}

Upvotes: 2

J94M

Reputation: 21

The problem, from what I can see, is that as the encoder works, occasionally it is adding a value to a certain character value, for example, let's say U+0070 + U+0066 (this is just an example). At some point, these values equal the null terminator (\0) or something equivalent to it, making it so the program doesn't read past that point when outputting the string and making it appear shorter than it should be.

I've encountered this problem with my own encoding algorithm before, and the best solution appears to be to add more variability to your algorithm; so, instead of only adding characters to the string, subtract some, multiply or XOR some at some point in the algorithm. This should remove (or at least reduce the chances of) null terminators appearing where you don't want them. This may, however, take some trial-and-error on your part to see what works and what doesn't.

Upvotes: 0

C++ Base64 Unicode - null bytes

Answers (2)

Related Questions