Reputation: 3809
I tried to convert a byte array to a long
long readAndSkipLong(char*& b)
{
unsigned long ret = (b[0] << 56) | (b[1] << 48) | (b[2] << 40) | (b[3]<<32) | (b[4] << 24) | (b[5] << 16) | (b[6] << 8) | (b[7]);
return ret;
}
My shifting seems to be not right. For the intended value
152 --> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 10011000
I get:
-104 --> 11111111 11111111 11111111 11111111 11111111 11111111 11111111 10011000
Any idea where the bug is?
Upvotes: 1
Views: 379
Reputation: 41932
2 things:
char
can be signed or unsigned, hence shouldn't be used storing for data types other than characters.
In C, C++ and most C-like languages, any types narrower than int
must be promoted to int
in an expression, and your statement will be treated like this
unsigned long ret = ((int)b[0] << 56) | ((int)b[1] << 48)
| ((int)b[2] << 40) | ((int)b[3] << 32)
| ((int)b[4] << 24) | ((int)b[5] << 16)
| ((int)b[6] << 8) | ((int)b[7]);
If char
is signed, it'll be promoted to int
using sign extension. As a result the top bits will be filled with 1s if the byte value is negative.
In MSVC char
is signed by default. You can use /J
to make char unsigned, which will solve part of your problem. But then another problem arises:
In Windows long
is a 32-bit type, consequently you can't pack 8 bytes into it. Moreover int
is also 32-bit on most modern systems, and after promoting b[i]
to int shifting more than 31 is undefined behavior which is what your program does.
So to fix all the problems portably you need to:
b[i]
to unsigned char
or uint8_t
, or masking the high bits by ANDing it with 0xFF like 0605002 suggested. Or simply change the type of b
to unsigned char&*
instead of char&*
(unsigned) long long
, (u)int64_t
or (u)int_least64_t
The result may look like this
uint64_t readAndSkipLong(unsigned char*& b)
{
return ((uint64_t)b[0] << 56) | ((uint64_t)b[1] << 48)
| ((uint64_t)b[2] << 40) | ((uint64_t)b[3] << 32)
| ((uint64_t)b[4] << 24) | ((uint64_t)b[5] << 16)
| ((uint64_t)b[6] << 8) | ((uint64_t)b[7]);
}
or
uint64_t readAndSkipLong(char*& b)
{
return ((uint64_t)(uint8_t)b[0] << 56) | ((uint64_t)(uint8_t)b[1] << 48)
| ((uint64_t)(uint8_t)b[2] << 40) | ((uint64_t)(uint8_t)b[3] << 32)
| ((uint64_t)(uint8_t)b[4] << 24) | ((uint64_t)(uint8_t)b[5] << 16)
| ((uint64_t)(uint8_t)b[6] << 8) | ((uint64_t)(uint8_t)b[7]);
}
However you don't actually need to write a function to reverse endian. There are already ntohll()
and htonll()
for that purpose
reversedEndian = ntohll(originalValue);
If the input must be a char
array then just copy the value to a uint64_t
memcpy(&originalValue, &b, sizeof originalValue);
reversedEndian = ntohll(originalValue);
You can further reduce the whole thing to reversedEndian = ntohll(*(int64_t*)&b);
if strict aliasing is allowed because on x86 unaligned access is generally permitted
Upvotes: 0
Reputation: 653
Couple of things to think about
cstdint
and use std::uint64_t
and std::uint8_t
for you types so that there is no issues with sign.uint64_t
and use that.Here's a bit of code I wrote for bytes to uint64_t
on a little endian machine.
std::uint64_t bytesToUint64(std::uint8_t* b) {
std::uint64_t msb = 0x0u;
for (int i(0); i < 7; i++) {
msb |= b[i];
msb <<= 8;
}
msb |= b[7];
return msb;
}
EDIT by OP (implemented Tip 1):
long readAndSkipLong(char*& b)
{
std::uint64_t ret =
((std::uint8_t)b[0] << 56) |
((std::uint8_t)b[1] << 48) |
((std::uint8_t)b[2] << 40) |
((std::uint8_t)b[3] << 32) |
((std::uint8_t)b[4] << 24) |
((std::uint8_t)b[5] << 16) |
((std::uint8_t)b[6] << 8) |
((std::uint8_t)b[7]);
b+=8;
return ret;
}
Upvotes: -1
Reputation: 13356
It's because of type promotion and sign extension. Every value in your char
array is signed, and bit-shifting is an integer operation. When you use the shifting operator, it evaluates to an int
, and because your char
s are signed, shifting them would produce signed int
s.
The last (rightmost) byte has 1
as the sign bit. When promoted to an int
, its value becomes -104
by sign extension. As you ORed the rest of the numbers, all the 1
bits remained unaffected.
To avoid this problem, you can cast each char
s to unsigned long
before shifting and ORing.
Another thing you can do is bitwise ANDing each char
with 0xff
like ((b[i] & 0xff) << 24)
. ANDing with 0xff
would produce an int
, keeping the least significant 8 bits intact and zeroes to the left, no sign extension.
Upvotes: 3