avmohan
avmohan

Reputation: 1980

Copying a 4 element character array into an integer in C

A char is 1 byte and an integer is 4 bytes. I want to copy byte-by-byte from a char[4] into an integer. I thought of different methods but I'm getting different answers.

char str[4]="abc";
unsigned int a = *(unsigned int*)str;
unsigned int b = str[0]<<24 | str[1]<<16 | str[2]<<8 | str[3];
unsigned int c;
memcpy(&c, str, 4);
printf("%u %u %u\n", a, b, c);

Output is 6513249 1633837824 6513249

Which one is correct? What is going wrong?

Upvotes: 9

Views: 10356

Answers (6)

lupy87
lupy87

Reputation: 1

If your using CVI (National Instruments) compiler you can use the function Scan to do this:

unsigned int a;

For big endian: Scan(str,"%1i[b4uzi1o3210]>%i",&a);

For little endian: Scan(str,"%1i[b4uzi1o0123]>%i",&a);

The o modifier specifies the byte order. i inside the square brackets indicates where to start in the str array.

Upvotes: 0

Jon
Jon

Reputation: 437386

It's an endianness issue. When you interpret the char* as an int* the first byte of the string becomes the least significant byte of the integer (because you ran this code on x86 which is little endian), while with the manual conversion the first byte becomes the most significant.

To put this into pictures, this is the source array:

   a      b      c      \0
+------+------+------+------+
| 0x61 | 0x62 | 0x63 | 0x00 |  <---- bytes in memory
+------+------+------+------+

When these bytes are interpreted as an integer in a little endian architecture the result is 0x00636261, which is decimal 6513249. On the other hand, placing each byte manually yields 0x61626300 -- decimal 1633837824.

Of course treating a char* as an int* is undefined behavior, so the difference is not important in practice because you are not really allowed to use the first conversion. There is however a way to achieve the same result, which is called type punning:

union {
    char str[4];
    unsigned int ui;
} u;

strcpy(u.str, "abc");
printf("%u\n", u.ui);

Upvotes: 15

Eric Postpischil
Eric Postpischil

Reputation: 222754

Neither of the first two is correct.

The first violates aliasing rules and may fail because the address of str is not properly aligned for an unsigned int. To reinterpret the bytes of a string as an unsigned int with the host system byte order, you may copy it with memcpy:

unsigned int a; memcpy(&a, &str, sizeof a);

(Presuming the size of an unsigned int and the size of str are the same.)

The second may fail with integer overflow because str[0] is promoted to an int, so str[0]<<24 has type int, but the value required by the shift may be larger than is representable in an int. To remedy this, use:

unsigned int b = (unsigned int) str[0] << 24 | …;

This second method interprets the bytes from str in big-endian order, regardless of the order of bytes in an unsigned int in the host system.

Upvotes: 6

Curd
Curd

Reputation: 12427

Both are correct in a way:

  • Your first solution copies in native byte order (i.e. the byte order the CPU uses) and thus may give different results depending on the type of CPU.

  • Your second solution copies in big endian byte order (i.e. most significant byte at lowest address) no matter what the CPU uses. It will yield the same value on all types of CPUs.

What is correct depends on how the original data (array of char) is meant to be interpreted.
E.g. Java code (class files) always use big endian byte order (no matter what the CPU is using). So if you want to read ints from a Java class file you have to use the second way. In other cases you might want to use the CPU dependent way (I think Matlab writes ints in native byte order into files, c.f. this question).

Upvotes: 1

user2363448
user2363448

Reputation:

You said you want to copy byte-by-byte.

That means the the line unsigned int a = *(unsigned int*)str; is not allowed. However, what you're doing is a fairly common way of reading an array as a different type (such as when you're reading a stream from disk.

It just needs some tweaking:

 char * str ="abc";
int i;
unsigned a;
char * c = (char * )&a;
for(i = 0; i < sizeof(unsigned); i++){
   c[i] = str[i];
}
printf("%d\n", a);

Bear in mind, the data you're reading may not share the same endianness as the machine you're reading from. This might help:

void 
changeEndian32(void * data)
{
    uint8_t * cp = (uint8_t *) data;
    union 
    {
        uint32_t word;
        uint8_t bytes[4];
    }temp;

    temp.bytes[0] = cp[3];
    temp.bytes[1] = cp[2];
    temp.bytes[2] = cp[1];
    temp.bytes[3] = cp[0];
    *((uint32_t *)data) = temp.word;
}

Upvotes: 1

ouah
ouah

Reputation: 145839

unsigned int a = *(unsigned int*)str;

This initialization is not correct and invokes undefined behavior. It violates C aliasing rules an potentially violates processor alignment.

Upvotes: 1

Related Questions