Reputation: 689

c Code that reads a 4 byte little endian number from a buffer

I encountered this piece of C code that's existing. I am struggling to understand it.

I supposidly reads a 4 byte unsigned value passed in a buffer (in little endian format) into a variable of type "long".

This code runs on a 64 bit word size, little endian x86 machine - where sizeof(long) is 8 bytes. My guess is that this code is intended to also run on a 32 bit x86 machine - so a variable of type long is used instead of int for sake of storing value from a four byte input data.

I am having some doubts and have put comments in the code to express what I understand, or what I don't :-)

Please answer questions below in that context

void read_Value_From_Four_Byte_Buff( char*input)
{
    /* use long so on 32 bit machine, can still accommodate 4 bytes */ 
    long intValueOfInput;  

    /* Bitwise and of input buffer's byte 0 with 0xFF gives MSB or LSB ?*/
    /* This code seems to assume that assignment will store in rightmost byte - is that true on a x86 machine ?*/
    intValueOfInput =  0xFF & input[0];

    /*left shift byte-1 eight times, bitwise "or" places in 2nd byte frm right*/
    intValueOfInput |= ((0xFF & input[1]) << 8);

    /* similar left shift in mult. of 8 and bitwise "or" for next two bytes */
    intValueOfInput |= ((0xFF & input[2]) << 16);
    intValueOfInput |= ((0xFF & input[3]) << 24);

}

My questions

1) The input buffer is expected to be in "Little endian". But from code looks like assumption here is that it read in as Byte 0 = MSB, Byte 1, Byte 2, Byte 3= LSB. I thought so because code reads bytes starting from Byte 0, and subsequent bytes ( 1 onwards) are placed in the target variable after left shifting. Is that how it is or am I getting it wrong ?

2) I feel this is a convoluted way of doing things - is there a simpler alternative to copy value from 4 byte buffer into a long variable ?

3) Will the assumption "that this code will run on a 64 bit machine" will have any bearing on how easily I can do this alternatively? I mean is all this trouble to keep it agnostic to word size ( I assume its agnostic to word size now - not sure though) ?

Thanks for your enlightenment :-)

Upvotes: 1

Answers (3)

David Grayson

Reputation: 87516

The code you are using does indeed treat the input buffer as little endian. Look how it takes the first byte of the buffer and just assigns it to the variable without any shifting. If the first byte increases by 1, the value of your result increases by 1, so it is the least-significant byte (LSB). Left-shifting makes a byte more significant, not less. Left-shifting by 8 is generally the same as multiplying by 256.
I don't think you can get much simpler than this unless you use an external function, or make assumptions about the machine this code is running on, or invoke undefined behavior. In most instances, it would work to just write uint32_t x = *(uint32_t *)input; but this assumes your machine is little endian and I think it might be undefined behavior according to the C standard.
No, running on a 64-bit machine is not a problem. I recommend using types like uint32_t and int32_t to make it easier to reason about whether your code will work on different architectures. You just need to include the stdint.h header from C99 to use those types.

The right-hand side of the last line of this function might exhibit undefined behavior depending on the data in the input:

((0xFF & input[3]) << 24)

The problem is that (0xFF & input[3]) will be a signed int (because of integer promotion). The int will probably be 32-bit, and you are shifting it so far to the left that the resulting value might not be representable in an int. The C standard says this is undefined behavior, and you should really try to avoid that because it gives the compiler a license to do whatever it wants and you won't be able to predict the result.

A solution is to convert it from an int to a uint32_t before shifting it, using a cast.

Finally, the variable intValueOfInput is written to but never used. Shouldn't you return it or store it somewhere?

Taking all this into account, I would rewrite the function like this:

uint32_t read_value_from_four_byte_buff(char * input)
{
    uint32_t x;
    x = 0xFF & input[0];
    x |= (0xFF & input[1]) << 8;
    x |= (0xFF & input[2]) << 16;
    x |= (uint32_t)(0xFF & input[3]) << 24;
    return x;
}

Upvotes: 2

Barmar

Reputation: 782509

You have it backwards. When you left shift, you're putting into more significant bits. So (0xFF & input[3]) << 24) puts Byte 3 into the MSB.
This is the way to do it in standard C. POSIX has the function ntohl() that converts from network byte order to a native 32-bit integer, so this is usually used in Unix/Linux applications.
This will not work exactly the same on a 64-bit machine, unless you use unsigned long instead of long. As currently written, the highest bit of input[3] will be put into the sign bit of the result (assuming a twos-complement machine), so you can get negative results. If long is 64 bits, all the results will be positive.

Upvotes: 3

Danke Xie

Reputation: 1797

From the code, Byte 0 is LSB, Byte 3 is MSB. But there are some typos. The lines should be

intValueOfInput |= ((0xFF & input[2]) << 16);
intValueOfInput |= ((0xFF & input[3]) << 24);

You can make the code shorter by dropping 0xFF but using the type "unsigned char" in the argument type.

To make the code shorter, you can do:

long intValueOfInput = 0;
for (int i = 0, shift = 0; i < 4; i++, shift += 8)
    intValueOfInput |= ((unsigned char)input[i]) << shift;

Upvotes: 2

c Code that reads a 4 byte little endian number from a buffer

Answers (3)

Related Questions