Bit shift and convert character to unicode escape string

Question

I found a java class that convert byte or char to hexadecimal value. But I cannot understand the code clearly. Can you explain what the code do or where I can find more resources about this?

public class UnicodeFormatter {

    static public String byteToHex(byte b) {
        // Returns hex String representation of byte b
        char hexDigit[] = {
            '0', '1', '2', '3', '4', '5', '6', '7',
            '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'
        };
        char[] array = {hexDigit[(b >> 4) & 0x0f], hexDigit[b & 0x0f]};
        return new String(array);
    }

    static public String charToHex(char c) {
        // Returns hex String representation of char c
        byte hi = (byte) (c >>> 8);
        byte lo = (byte) (c & 0xff);
        return byteToHex(hi) + byteToHex(lo);
    }
} // class

Jo&#227;o Silva · Accepted Answer

First, let us start with some definitions:

A char, in Java, occupies 2 bytes;
Each byte is composed by 8 bits;
Each hexadecimal digit represents 4 binary digits or bits;

Therefore, a byte can be represented by 2 hexadecimal digits, i.e., two groups of 4 bits. This is exactly what is being done in the byteToHex method: it firsts splits the byte in two groups of 4 bits, and then maps each into an hexadecimal symbol, using the hexDigit array. Since the decimal value of each group of 4 bits can never be greater or equal than 16 (2^4), each group will always have a mapping in the hexDigits array.

For example, suppose you want to convert the number 29 to hexadecimal:

29 is represented in binary as 00011101;
Splitting 00011101 in two groups of 4 bits yields 0001 and 1101;
Programatically, the first group, 0001 can be obtained by shifting away the least significant 4 bits (1101) from the binary representation of 29. Then, 0001 would become the first 4 bits. This is accomplished in Java with (b >> 4);
The second group, is obtained by b & 0x0f, which is equivalent to 00011101 & 00001111 = 00001101 = 1101. By bit-ANDing the binary number with 0x0f you are clearing (setting to 0) everything except the least significant 4 bits.
Finally, each group is converted to a decimal number, yielding 1 (0001) and 13 (1101), which are then mapped to 1 and D respectively, in the hexadecimal system.
The number 29 is therefore represented by 1D in hexadecimal.

A similar logic can be applied to the method charToHex. The only difference is instead of converting a single byte, you are converting 2, since a char is 2 bytes.

Bit shift and convert character to unicode escape string

Answers (2)

Related Questions