char array to long results in unexpected value

I tried to convert a byte array to a long

long readAndSkipLong(char*& b)
{
    unsigned long ret = (b[0] << 56) | (b[1] << 48) | (b[2] << 40) | (b[3]<<32) | (b[4] << 24) | (b[5] << 16) | (b[6] << 8) | (b[7]);
    return ret;
}

My shifting seems to be not right. For the intended value

152  --> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 10011000

I get:

-104  --> 11111111 11111111 11111111 11111111 11111111 11111111 11111111 10011000

Any idea where the bug is?

Upvotes: 1

Answers (3)

phuclv

Reputation: 41932

2 things:

char can be signed or unsigned, hence shouldn't be used storing for data types other than characters.

In C, C++ and most C-like languages, any types narrower than int must be promoted to int in an expression, and your statement will be treated like this
```
 unsigned long ret = ((int)b[0] << 56) | ((int)b[1] << 48)
                   | ((int)b[2] << 40) | ((int)b[3] << 32)
                   | ((int)b[4] << 24) | ((int)b[5] << 16)
                   | ((int)b[6] <<  8) | ((int)b[7]);
```
If char is signed, it'll be promoted to int using sign extension. As a result the top bits will be filled with 1s if the byte value is negative.

In MSVC char is signed by default. You can use /J to make char unsigned, which will solve part of your problem. But then another problem arises:
In Windows long is a 32-bit type, consequently you can't pack 8 bytes into it. Moreover int is also 32-bit on most modern systems, and after promoting b[i] to int shifting more than 31 is undefined behavior which is what your program does.

So to fix all the problems portably you need to:

Cast all b[i] to unsigned char or uint8_t, or masking the high bits by ANDing it with 0xFF like 0605002 suggested. Or simply change the type of b to unsigned char&* instead of char&*
Change the return type to a type at least 64-bit wide like (unsigned) long long, (u)int64_t or (u)int_least64_t

The result may look like this

uint64_t readAndSkipLong(unsigned char*& b)
{
    return ((uint64_t)b[0] << 56) | ((uint64_t)b[1] << 48)
         | ((uint64_t)b[2] << 40) | ((uint64_t)b[3] << 32)
         | ((uint64_t)b[4] << 24) | ((uint64_t)b[5] << 16)
         | ((uint64_t)b[6] <<  8) | ((uint64_t)b[7]);
}

uint64_t readAndSkipLong(char*& b)
{
    return ((uint64_t)(uint8_t)b[0] << 56) | ((uint64_t)(uint8_t)b[1] << 48)
         | ((uint64_t)(uint8_t)b[2] << 40) | ((uint64_t)(uint8_t)b[3] << 32)
         | ((uint64_t)(uint8_t)b[4] << 24) | ((uint64_t)(uint8_t)b[5] << 16)
         | ((uint64_t)(uint8_t)b[6] <<  8) | ((uint64_t)(uint8_t)b[7]);
}

However you don't actually need to write a function to reverse endian. There are already ntohll() and htonll() for that purpose

reversedEndian = ntohll(originalValue);

If the input must be a char array then just copy the value to a uint64_t

memcpy(&originalValue, &b, sizeof originalValue);
reversedEndian = ntohll(originalValue);

You can further reduce the whole thing to reversedEndian = ntohll(*(int64_t*)&b); if strict aliasing is allowed because on x86 unaligned access is generally permitted

Upvotes: 0

Sid Heroor

Reputation: 653

Couple of things to think about

include cstdint and use std::uint64_t and std::uint8_t for you types so that there is no issues with sign.
The logic also depends on whether your machine is little endian or big endian. For Little Endian machines, you need to put your least significant byte first and then go higher. Your logic is for Big Endian.
It's possible that you are having a count overflow. A better way would be to explicitly declare a uint64_t and use that.

Here's a bit of code I wrote for bytes to uint64_t on a little endian machine.

std::uint64_t bytesToUint64(std::uint8_t* b) {
    std::uint64_t msb = 0x0u;
    for (int i(0); i < 7; i++) {
        msb |= b[i];
        msb <<= 8;
    }
    msb |= b[7];

    return msb;
}

EDIT by OP (implemented Tip 1):

long readAndSkipLong(char*& b)
{
    std::uint64_t ret = 
        ((std::uint8_t)b[0] << 56) | 
        ((std::uint8_t)b[1] << 48) | 
        ((std::uint8_t)b[2] << 40) | 
        ((std::uint8_t)b[3] << 32) | 
        ((std::uint8_t)b[4] << 24) | 
        ((std::uint8_t)b[5] << 16) | 
        ((std::uint8_t)b[6] <<  8) | 
        ((std::uint8_t)b[7]);
    b+=8;

    return ret;
}

Upvotes: -1

Sufian Latif

Reputation: 13356

It's because of type promotion and sign extension. Every value in your char array is signed, and bit-shifting is an integer operation. When you use the shifting operator, it evaluates to an int, and because your chars are signed, shifting them would produce signed ints.

The last (rightmost) byte has 1 as the sign bit. When promoted to an int, its value becomes -104 by sign extension. As you ORed the rest of the numbers, all the 1 bits remained unaffected.

To avoid this problem, you can cast each chars to unsigned long before shifting and ORing.

Another thing you can do is bitwise ANDing each char with 0xff like ((b[i] & 0xff) << 24). ANDing with 0xff would produce an int, keeping the least significant 8 bits intact and zeroes to the left, no sign extension.

Upvotes: 3

char array to long results in unexpected value

Answers (3)

Related Questions