ABu
ABu

Reputation: 12249

Base64 encoding and equal sign at the end, instead of A (base64 value of number 0)

According to wikipedia:

When the number of bytes to encode is not divisible by three (that is, if there are only one or two bytes of input for the last 24-bit block), then the following action is performed:

Add extra bytes with value zero so there are three bytes, and perform the conversion to base64.

However, if we got an extra \0 character at the end, the last 6 bits of the input have a value of 0. And the number 0 must be base64-codified as A. The character = doesn't even belong to the base64 encoding table.

I know that those extra null characters doesn't belong to the original binary string, so, we use a different character (=) to avoid confussions, but anyway, the Wikipedia article and other thousand sites doesn't say that. They say that the newly constructed string must be base64-encoded (sentence which strictly implies the use of the transformation table).

Are all of these sites wrong?

Upvotes: 0

Views: 3224

Answers (1)

supercat
supercat

Reputation: 81115

Any sequence of four characters chosen from the main base64 set will represent precisely three octets worth of data. Consequently, If the total length of the file to be encoded it will be necessary to either:

  1. Allow the encoded file to have a length which is not a multiple of 4.

  2. Allow the encoded file to have characters outside the main set of 64.

If the former approach were used, then concatenating of files whose length was not a multiple of three would be likely to yield a file that might appear valid but would contain bogus information. For example, a file with length 32 would expand to ten groups of four base64 characters plus three more for the final pair of octets (total 43). Concatenating another file with length 32 would yield a total of 86 characters which might look valid, but information from the second half would not decode correctly.

Using the latter approach, concatenation of files whose length was not a multiple of three would yield a result that could be unambiguously parsed or, at worst, recognized as invalid (the base64 Standard does not regard as valid a file that contains "=" anywhere but at the end, but one could write a decoder that could process such files unambiguously). In any case, having such a file be regarded as invalid would be better than having a file which appeared valid but which produces incorrect data when decoded.

Upvotes: 1

Related Questions