ChaoSXDemon
ChaoSXDemon

Reputation: 910

Java Byte[] to int (Big Endian) using << Weirdness

Suppose we have the following byte[4]:

44 a4 8a c6

So in what's wrong with the following code:

public static int asIntBigEndian(byte[] raw, int offset){
int result = 0;
for(int i=offset; i<offset+4; ++i){
    result = (result << 4) | raw[i]; 
}
return result;
}

The result of calling asIntBigEndian(raw, 0) is:

ff ff ff e6

What I have noticed is that if I were to read the first byte and print it out, I get:

44

I would get the same result if I were to do this:

System.out.println(Integer.toHexString(raw[0] << 24));

0x44000000

So If I were to continue the logic ...

System.out.println(Integer.toHexString( (raw[0] << 24)|(raw[1] << 16) );

0xffa40000

Basically the first byte turned into 0xff while the 2nd byte 0xa4 has been "xor" onto the right position. Why is this happening?

Upvotes: 1

Views: 782

Answers (2)

David Ehrmann
David Ehrmann

Reputation: 7576

In practice, do

public static int asIntBigEndian(byte[] raw, int offset){
    ByteBuffer buffer = ByteBuffer.wrap(raw, offset, 4);
    buffer.order(ByteOrder.BIG_ENDIAN);
    return buffer.getInt();
}

There's overhead, but it's so easy.

For that matter, in your calling code, you might be better served by a ByteBuffer.

Upvotes: 0

bytes in Java have a range from -128 (-0x80) to 127 (0x7F). 164 (0xA4) is not a valid value, but "A4" is what you get by printing -92 (-0x5C) as if it was unsigned.

Converting -0x5C to an int also gives -0x0000005C. -0x0000005C, printed as unsigned, is 0xFFFFFFA4.

Another, possibly simpler, way to think about it is to think of all values as unsigned, but treat the conversion as a sign extension - where the top bit gets copied into all the new bits. If you think of it this way, 0xA4 is a valid byte and (int)0xA4 is 0xFFFFFFA4. Same result, easier thought process, but it's a less correct way to think about numbers in Java (not that it makes a difference).

0xFFFFFFA4 << 16 gives 0xFFA40000 and 0x44000000 | 0xFFA40000 gives 0xFFA40000 - which is how you got that result.

The fix is simple - instead of raw[i], use ((int)raw[i] & 0xFF), or just (raw[i] & 0xFF) as the conversion to int is implicit.

Also, unrelated to that problem, (result << 4) should be (result << 8). Otherwise you're calculating 0x44000 | 0xA400 | 0x8A0 | 0xC6 instead of 0x44000000 | 0xA40000 | 0x8A00 | 0xC6.

Upvotes: 3

Related Questions