solusipse
solusipse

Reputation: 587

Strange values when converting bytes to integer in c

Take a look at this code:

#include <stdio.h>
#include <stdlib.h>

int byteToInt(char *bytes) {
    int32_t v = 
        (bytes[0]      ) +
        (bytes[1] << 8 ) +
        (bytes[2] << 16) +
        (bytes[3] << 24);
    return v;
}

int main() {
    char b1[] = {0xec, 0x51, 0x04, 0x00};
    char b2[] = {0x0c, 0x0c, 0x00, 0x00};

    printf("%d\n", byteToInt(b1));
    printf("%d\n", byteToInt(b2));
    printf("%d\n", *(uint32_t *)b1);
    printf("%d\n", *(uint32_t *)b2);

    return 0;
}

{0xec, 0x51, 0x04, 0x00} is equal to 283116, but when I use byteToInt function, it returns, for some reason, 282860. There are some byte arrays that cause similar troubles. I realized that value is always mistaken by 256. Still, most of the cases work without any problems - just take a look at b2, it's being calculated as 3084, which is correct. Casting method works in these cases perfetcly but I'd like to know what described problems happen. Could someone, please, explain this to me?

Upvotes: 0

Views: 60

Answers (2)

AlexD
AlexD

Reputation: 32576

Perhaps char is a signed type (it is implementation-defined), and (int)(char)(0xec) is -20, while (int)(unsigned char)(0xec) is 236.

Try to use unsigned char and uint32_t.

uint32_t byteToInt(unsigned char *bytes) {
    uint32_t v =
        ((uint32_t)bytes[0]) +
        ((uint32_t)bytes[1] << 8) +
        ((uint32_t)bytes[2] << 16) +
        ((uint32_t)bytes[3] << 24);
    return v;
}

int main() {
    unsigned char b1[] = { 0xec, 0x51, 0x04, 0x00 };
    unsigned char b2[] = { 0x0c, 0x0c, 0x00, 0x00 };

    printf("%u\n", byteToInt(b1));     // 'u' for unsigned
    printf("%u\n", byteToInt(b2));
    //printf("%u\n", *(uint32_t *)b1); // undefined behavior
    //printf("%u\n", *(uint32_t *)b2); // ditto

    return 0;
}

Note that re-interpretation memory content as it is done in two last printfs is undefined behavior (although often works in practice).

BTW, shifting signed negative values is undefined according to the standard:

The result of E1 << E2 is E1 left-shifted E2 bit positions; ... If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

Upvotes: 4

Jeffrey Naujok
Jeffrey Naujok

Reputation: 1

There are several potential issues with this code. The first is that it is compiler dependent on whether the char type is 8 bits, 16 bits, or even 32 bits. When you do a shift operation on the character type, it may potentially lose the bits "off the end" of the value.

It is safer to first cast the values to a 32 bit type before shifting them and adding them. For example:

unsigned long v = 
    ((unsigned long)bytes[0]      ) +
    ((unsigned long)bytes[1] << 8 ) +
    ((unsigned long)bytes[2] << 16) +
    ((unsigned long)bytes[3] << 24);

Your use of the int32_t is also compiler dependent. If memory serves, that's a Windows specific reclassification of int. "int" itself is compiler dependent, older compilers may have it as a 16 bit value, as the standard says it should be the size of a word on the machine you are working on. Using "long" instead of "int" guarantees a 32 bit value.

Additionally, I used "unsigned long" in the example, because I don't think you want to deal with negative numbers in this case. In binary representation, negative numbers have the highest bit set (0x8000000).

If you do want to use negative numbers, then the type should be "long" instead, although this opens a different can of worms when adding positive valued bytes to a negative valued largest byte. In the case where you wanted to deal with negative numbers, you should do a wholly different conversion that peels off the high bit of the high byte, does the addition, and then, if the high bit was set, makes the value negative (v = -v;), and then you need to subtract 1 because of the representation of negative numbers (which is probably outside the scope of this question.)

The revised code, then would be:

#include <stdio.h>
#include <stdlib.h>

unsigned long byteToInt(char *bytes) {
    unsigned long v = 
        ((unsigned long)bytes[0]      ) +
        ((unsigned long)bytes[1] << 8 ) +
        ((unsigned long)bytes[2] << 16) +
        ((unsigned long)bytes[3] << 24);
    return v;
}

int main() {
    char b1[] = {0xec, 0x51, 0x04, 0x00};
    char b2[] = {0x0c, 0x0c, 0x00, 0x00};

    printf("%d\n", byteToInt(b1));
    printf("%d\n", byteToInt(b2));
    printf("%d\n", *(unsigned long *)b1);
    printf("%d\n", *(unsigned long *)b2);

    return 0;
}

Upvotes: 0

Related Questions