ali haider
ali haider

Reputation: 20182

How to base-64 encode a hex string

I am trying to base-64 encode a hex string (copied below) but the values that I get from the Java8 call to encode to base64 do not match what I am getting on different online converters. I am trying to figure out what steps I am missing (or missteps I am taking):

//hexString is 07050600030102
Base64.getEncoder().encodeToString(hexString.getBytes(StandardCharsets.UTF_8));

//output I am getting from Java8 ic copied below:
MDcwNTA2MDAwMzAxMDI=

//online converters:
BwUGAAMBAg==

Upvotes: 2

Views: 6306

Answers (2)

Kedar Mhaswade
Kedar Mhaswade

Reputation: 4695

Jon's answer is correct, but I thought I would make an attempt to explain it a little differently. I reckon that encoding/decoding can be a little confusing at times.

When you say that your data is encoded as a "hex string", that data is made "pretty printable". In fact, "hex encoding" is the easiest thing you can do to any binary data if you were to print it. With hex encoding, no binary data is non-printable (on computer systems that we know of)!

To make it clearer, let's say someone gives you a "hex encoded" string a9 (the idea is same as your 07050600030102). This means that when interpreted a certain byte stream as hex characters it becomes a9. Since each of the hex characters: [0-9][a-f] can be encoded as a nibble 0000 through 1111, you can decode the actual bits as: 1010 1001 (blank is used for brevity). So, what is hex encoded as a9 is in fact a single byte 10101001.

So, if you were to now "base64-encode" it, you should use 10101001 as the input! In terms of byte array this would be: {-87} because -87 is the decimal value of the bit sequence 10101001 in two's complement representation of integer values in Java.

When you do: hexString.getBytes(StandardCharsets.UTF_8) or hexString.getBytes() (if the default charset is UTF-8 on your computer), then you are going to get the bytes of the hexString interpreted according to the UTF-8 encoding and since that encoding is backward compatible with the ASCII encoding, what you'd get is a 2-byte array, the first byte of which is decimal 97 (or binary 01100001) representing the character 'a', and the second byte is decimal 57, (or binary 00111001) representing the character '9' (the decimal 9). Thus, the byte array you would get from the getBytes() call is: {97, 57}.

As you can see, these two are two different things. You want to base64-encode bytes represented by array {-87}, but you end up base64-encoding bytes represented by array {97, 57}.

Upvotes: 4

Jon Skeet
Jon Skeet

Reputation: 1499770

This doesn't do what you expect it to:

hexString.getBytes(StandardCharsets.UTF_8)

That's just encoding the hex string as UTF-8 - you want to parse the hex string, so that each pair of hex digits ends up as a single byte. The fact that the base64 result is different is just because the bytes you're base64-encoding are different.

To parse a hex string into bytes, you can use Guava (amongst other libraries)

byte[] bytes = BaseEncoding.base16().decode(hexString);
String base64 = BaseEncoding.base64().encode(bytes);

Upvotes: 5

Related Questions