Costin
Costin

Reputation: 3029

fail to correctly encode and decode using java.util.Base64

Let "awids" be 12 characters length ids in base 64 (A-Z a-z 0-9 "-" "@"). This is the input.

My final goal is to create a bijective mapping between these awids and UUIDs, using some paddings, having as initial input the awids.

While trying to use java.util.Base64 I do not get the initial value after a decoding and an encoding again. What is the stupid error I do? :)

With the reproducible example I present below the output is wrong because the input string is not gotten back after a decode()-encode() and the bijection is not preserved (Q39s/L and Q39s/A map both to the same value).

    ------------------------------------------> Q39s/L (6 [51 33 39 73 2f 4c]) 
    4 [43 7f 6c fc] -> 6 [51 33 39 73 2f 41] -> Q39s/A (6 [51 33 39 73 2f 41]) 
    4 [43 7f 6c fc] -> 6 [51 33 39 73 2f 41] -> Q39s/A (6 [51 33 39 73 2f 41])

Here a is a reproducible example:



    import java.nio.charset.StandardCharsets;
    import java.util.Base64;
    import java.util.StringJoiner;

    public class StackOverflowQuestion {

      public static void main(String[] args) {

        String halfAwid = "Q39s/L";

        byte[] sigBits = Base64.getDecoder().decode(halfAwid.getBytes(StandardCharsets.UTF_8));

        byte[] actualSigBits = Base64.getEncoder().withoutPadding().encode(sigBits);

        String actualHalfAwid = new String(actualSigBits, StandardCharsets.UTF_8);

        byte[] sigBits2 = Base64.getDecoder().decode(halfAwid.getBytes(StandardCharsets.UTF_8));
        byte[] actualSigBits2 = Base64.getEncoder().withoutPadding().encode(sigBits2);
        String actualHalfAwid2 = new String(actualSigBits2, StandardCharsets.UTF_8);

        System.out.println("----------------------------------------------> "
            + halfAwid + " (" + toHexString(halfAwid) + ") "
            + "\n"
            + "    "
            + toHexString(sigBits) + " -> "
            + toHexString(actualSigBits) + " -> "
            + actualHalfAwid + " (" + toHexString(actualHalfAwid) + ") "
            + "\n"
            + "    "
            + toHexString(sigBits2) + " -> "
            + toHexString(actualSigBits2) + " -> "
            + actualHalfAwid2 + " (" + toHexString(actualHalfAwid2) + ")"
            + "");
      }

      private static String toHexString(byte[] bytes) {
        StringJoiner joiner = new StringJoiner(" ", "" + bytes.length + " [", "]");
        for (byte b : bytes) {
          joiner.add(String.format("%02x", b));
        }
        return joiner.toString();
      }

      private static String toHexString(String text) {
        return toHexString(text.getBytes());
      }
    }

Do not hesitate to point any other errors I do in the code, even if they are not related directly to the question. Thank you.

Upvotes: 4

Views: 1516

Answers (1)

Holger
Holger

Reputation: 298153

The Base64 encoding is not a bijective mapping for all input sizes, if you treat the encoded data as a sequence of whole bytes (or ASCII characters). Base64 is encoding units of eight bits to units of six bits (yielding 64 possible combinations for each unit), so when you encode four bytes, in other words 4×8=32 bits, you will get 32/6=5⅓ units output, which implies that the sixth unit of the output will not use all bits.

In other words, when you treat an arbitrary string consisting of six of the 64 defined characters as being Base64 encoded, you will project a string of 64⁶ combinations to a “source” sequence of six bytes having 256⁴ combinations, which implies a data loss.

You can use Base64 encoding as a bijective mapping if you choose input sizes which can be projected to a whole number of units, e.g. obviously six source bytes can be encoded as eight Base64 encoded bytes. But it doesn’t work for six encoded bytes. Interestingly, it will work for your actually desired size as nine source bytes will get encoded to exactly twelve encoded bytes: 9×8=72, 72/6=12.

Upvotes: 2

Related Questions