Reputation: 423

simple bit manipulation fails

I am learning bit manipulation in C and I have written a simple program. However the program fails. Can someone please look into this code? Basically I want to extract and reassemble a 4 byte 'long' variable to its induvidual bytes and vice versa. Here is my code:

    printf("sizeof char= %d\n", sizeof(char));
    printf("sizeof unsigned char= %d\n", sizeof(unsigned char));
    printf("sizeof int= %d\n", sizeof(int));
    printf("sizeof long= %d\n", sizeof(long));
    printf("sizeof unsigned long long= %d\n", sizeof(unsigned long long));

    long val = 2;
    int k = 0;
    size_t len = sizeof(val);
    printf("val = %ld\n", val);
    printf("len = %d\n", len);

    char *ptr;
    ptr = (char *)malloc(sizeof(len));
    //converting 'val' to char array
    //val = b3b2b1b0 //where 'b is 1 byte. Since 'long' is made of 4 bytes, and char is 1 byte, extracting byte by byte of long into char
    //do{
        //val++;
    for(k = 0; k<len; k++){
        ptr[k] = ((val >> (k*len)) && 0xFF);
        printf("ptr[%d] = %02X\n", k,ptr[k]);
    }
    //}while(val < 12);

    //reassembling the bytes from char and converting them to long
    long xx = 0;
    int m = 0;
    for(m = 0; m< len; m++){
        xx = xx |(ptr[m]<<(m*8));
    }
    printf("xx= %ld\n", xx);

Why don't I see xx returning 2?? Also, irrespective of the value of 'val', the ptr[0] seems to store 1 :( Please help

Thanks in advance

Upvotes: 0

Answers (3)

autistic

Reputation: 15642

As others have by now mentioned, I'm not sure if ptr[k] = ((val >> (k*len)) && 0xFF); does what you want it to. The && operator is a boolean operator. If (value >> (k*len)) is some non-zero value, and 0xFF is some non-zero value, then the value stored into ptr[k] will be one. That's the way boolean operators work. Perhaps you meant to use & instead of &&.

Additionally, you've chosen to use shift operators, which is appropriate for unsigned types, but has a variety of non-portable aspects for signed types. xx = xx |(ptr[m]<<(m*8)); potentially invokes undefined behaviour, for example, because it looks like it could result in signed integer overflow.

In C, sizeof (char) is always 1, because the sizeof operator tells you how many chars are used to represent a type. eg. sizeof (int) tells you how many chars are used to represent ints. It's CHAR_BIT that changes. Thus, your code shouldn't rely upon the sizeof a type.

In fact, if you want your code to be portable, then you shouldn't be expecting to be able to store values greater than 32767 or less than -32767 in an int, for example. This is regardless of size, because padding bits might exist. To summarise: the sizeof a type doesn't necessarily reflect the set of values it can store!

Choose the types of your variables for their application, portably. If your application doesn't need values beyond that range, then int will do fine. Otherwise, you might want to think about using a long int, which can store values between (and including) -2147483647 and 2147483647, portably. If you need values beyond that, use a long long int, which will give you the guaranteed range consisting of at least the values between -9223372036854775807 and 9223372036854775807. Anything beyond that probably deserves a multi-precision arithmetic library such as GMP.

When you don't expect to use negative values, you should use unsigned types.

With consideration given to your portable choice of integer type, it now makes sense that you can devise a portable way to write those integers into files, and read those integers from files. You'll want to extract the sign and absolute value into unsigned int:

unsigned int sign = val < 0; /* conventionally 1 for negative, 0 for positive */
unsigned int abs_val = val;
if (val < 0) { abs_val = -abs_val; }

... and then construct an array of 8-bit chunks of abs_val and sign, merged together. We've already decided using portable decision-making that our int can only store 16 bits, because we're only ever storing values between -32767 and 32767 in it. As a result, there is no need for a loop, or bitwise shifts. We can use multiplication to move our sign bit, and division/modulo to reduce our absolute value. Consider that the sign conventionally goes with the most significant bit, which is either at the start (big endian) or the end (little endian) of our array.

unsigned char big_endian[] = { sign * 0x80 + abs_val / 0x100,
                               abs_value % 0x100 };
unsigned char lil_endian[] = { abs_value % 0x100,
                               sign * 0x80 + abs_val / 0x100 };

To reverse this process, we perform the opposite operations in reverse of each other (that is, using division and modulo in place of multiplication, multiplication in place of division and addition, extract the sign bit and reform the value):

unsigned int big_endian_sign = array[0] / 0x80;
int big_endian_val = big_endian_sign
                   ? -((array[0] % 0x80) * 0x100 + array[1])
                   :  ((array[0] % 0x80) * 0x100 + array[1]);

unsigned int lil_endian_sign = array[1] / 0x80;
int lil_endian_val = lil_endian_sign
                   ? -((array[1] % 0x80) * 0x100 + array[0])
                   :  ((array[1] % 0x80) * 0x100 + array[0]);

The code gets a little more complex for long, and it becomes worthwhile to use binary operators. The extraction of sign and absolute value remains essentially the same, with the only changes being the type of the variables. We still don't need loops, because we made a decision that we only care about values representable portably. Here's how I'd convert from a long val to an unsigned char[4]:

unsigned long sign = val < 0; /* conventionally 1 for negative, 0 for positive */
unsigned long abs_val = val;
if (val < 0) { abs_val = -abs_val; }

unsigned char big_endian[] = { (sign << 7) | ((abs_val >> 24) & 0xFF),
                               (abs_val >> 16) & 0xFF,
                               (abs_val >> 8) & 0xFF,
                               abs_val & 0xFF };
unsigned char lil_endian[] = { abs_val & 0xFF,
                               (abs_val >> 8) & 0xFF,
                               (abs_val >> 16) & 0xFF,
                               (sign << 7) | ((abs_val >> 24) & 0xFF) };

... and here's how I'd convert back to the signed value:

unsigned int big_endian_sign = array[0] >> 7;
long big_endian_val = big_endian_sign
                   ? -((array[0] & 0x7F) << 24) + (array[1] << 16) + (array[2] << 8) + array[3]
                   :  ((array[0] & 0x7F) << 24) + (array[1] << 16) + (array[2] << 8) + array[3];

unsigned int lil_endian_sign = array[3] >> 7;
long lil_endian_val = lil_endian_sign
                   ? -((array[3] & 0x7F) << 24) + (array[2] << 16) + (array[1] << 8) + array[0]
                   :  ((array[3] & 0x7F) << 24) + (array[2] << 16) + (array[1] << 8) + array[0];

I'll leave you to devise a scheme for unsigned and long long types... and open up the floor for comments:

Upvotes: 1

Simon

Reputation: 6363

ptr[k] = ((val >> (k*len)) && 0xFF);

Should be

ptr[k] = ((val >> (k*8)) & 0xFF);

&& is used in conditional statements and & for bitwise and. Also as you're splitting the value up into chars, each iteration of the loop you want to shift with as many bits as are in a byte. This is almost always 8 but can be something else. The header file limits.h has the info about that.

Upvotes: 4

Taylor Brandstetter

Reputation: 3623

A few things I notice:

You're using the boolean && operator instead of bitwise &
You're shifting by "k*len" instead of "k*8"
You're allocating an array with "sizeof(len)", instead of just "len"
You're using "char" instead of "unsigned char". This will make the "(ptr[m]<<(m*8))" expression sometimes give you a negative number.

So a fixed version of your code would be:

printf("sizeof char= %d\n", sizeof(char));
printf("sizeof unsigned char= %d\n", sizeof(unsigned char));
printf("sizeof int= %d\n", sizeof(int));
printf("sizeof long= %d\n", sizeof(long));
printf("sizeof unsigned long long= %d\n", sizeof(unsigned long long));

long val = 2;
int k = 0;
size_t len = sizeof(val);
printf("val = %ld\n", val);
printf("len = %d\n", len);

unsigned char *ptr;
ptr = (unsigned char *)malloc(len);
//converting 'val' to char array
//val = b3b2b1b0 //where 'b is 1 byte. Since 'long' is made of 4 bytes, and char is 1 byte, extracting byte by byte of long into char
//do{
    //val++;
for(k = 0; k<len; k++){
    ptr[k] = ((val >> (k*8)) & 0xFF);
    printf("ptr[%d] = %02X\n", k,ptr[k]);
}
//}while(val < 12);

//reassembling the bytes from char and converting them to long
long xx = 0;
int m = 0;
for(m = 0; m< len; m++){
    xx = xx |(ptr[m]<< m*8);
}
printf("xx= %ld\n", xx);

Also, in the future, questions like this would be better suited to https://codereview.stackexchange.com/

Upvotes: 3

simple bit manipulation fails

Answers (3)

Related Questions