hungneox
hungneox

Reputation: 9829

Bit shift and convert character to unicode escape string

I found a java class that convert byte or char to hexadecimal value. But I cannot understand the code clearly. Can you explain what the code do or where I can find more resources about this?

public class UnicodeFormatter {

    static public String byteToHex(byte b) {
        // Returns hex String representation of byte b
        char hexDigit[] = {
            '0', '1', '2', '3', '4', '5', '6', '7',
            '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'
        };
        char[] array = {hexDigit[(b >> 4) & 0x0f], hexDigit[b & 0x0f]};
        return new String(array);
    }

    static public String charToHex(char c) {
        // Returns hex String representation of char c
        byte hi = (byte) (c >>> 8);
        byte lo = (byte) (c & 0xff);
        return byteToHex(hi) + byteToHex(lo);
    }
} // class

Upvotes: 3

Views: 3760

Answers (2)

João Silva
João Silva

Reputation: 91349

First, let us start with some definitions:

  • A char, in Java, occupies 2 bytes;
  • Each byte is composed by 8 bits;
  • Each hexadecimal digit represents 4 binary digits or bits;

Therefore, a byte can be represented by 2 hexadecimal digits, i.e., two groups of 4 bits. This is exactly what is being done in the byteToHex method: it firsts splits the byte in two groups of 4 bits, and then maps each into an hexadecimal symbol, using the hexDigit array. Since the decimal value of each group of 4 bits can never be greater or equal than 16 (2^4), each group will always have a mapping in the hexDigits array.

For example, suppose you want to convert the number 29 to hexadecimal:

  1. 29 is represented in binary as 00011101;
  2. Splitting 00011101 in two groups of 4 bits yields 0001 and 1101;
  3. Programatically, the first group, 0001 can be obtained by shifting away the least significant 4 bits (1101) from the binary representation of 29. Then, 0001 would become the first 4 bits. This is accomplished in Java with (b >> 4);
  4. The second group, is obtained by b & 0x0f, which is equivalent to 00011101 & 00001111 = 00001101 = 1101. By bit-ANDing the binary number with 0x0f you are clearing (setting to 0) everything except the least significant 4 bits.
  5. Finally, each group is converted to a decimal number, yielding 1 (0001) and 13 (1101), which are then mapped to 1 and D respectively, in the hexadecimal system.
  6. The number 29 is therefore represented by 1D in hexadecimal.

A similar logic can be applied to the method charToHex. The only difference is instead of converting a single byte, you are converting 2, since a char is 2 bytes.

Upvotes: 4

Thomas
Thomas

Reputation: 5094

Basically what it is doing here is the same as turning 23 into a string by changing it to 2*10+3, then turning 2 and 3 into characters.

To break it down, we first divide by 16, since we're working in hex.

b >> 4 means shift the bits 4 spaces, so

12345678 >> 4 = 00001234  

then the value in positions 1234 get looked up in the hexDigit array.

Then we do a modulus operation, also known as getting the remainder. In the decimal example, this is finding the 3 by chopping off everything to the left. For binary, they are using AND here.

0x0f in bits is 00001111, so when ANDed with a byte, it will change the left 4 spaces into 0s, leaving only the right 4.

12345678 & 0x0f = 00005678

and again we look up the value in positions 5678 in the hexDigit array. Note that I'm using 1-8 as position markers, the actual data will be all 0s and 1s.

Edit: The second function does basically the same operation, it uses the same >>> and & functions to split the unicode char into bytes. It appears to be assuming the unicode character is 16 bits, so it shifts it 8 places to get the left 8 bits, and uses & 0xff to get the right 8 bits.

Upvotes: 2

Related Questions