Reputation: 1980
A char is 1 byte and an integer is 4 bytes. I want to copy byte-by-byte from a char[4] into an integer. I thought of different methods but I'm getting different answers.
char str[4]="abc";
unsigned int a = *(unsigned int*)str;
unsigned int b = str[0]<<24 | str[1]<<16 | str[2]<<8 | str[3];
unsigned int c;
memcpy(&c, str, 4);
printf("%u %u %u\n", a, b, c);
Output is 6513249 1633837824 6513249
Which one is correct? What is going wrong?
Upvotes: 9
Views: 10356
Reputation: 1
If your using CVI (National Instruments) compiler you can use the function Scan to do this:
unsigned int a;
For big endian: Scan(str,"%1i[b4uzi1o3210]>%i",&a);
For little endian: Scan(str,"%1i[b4uzi1o0123]>%i",&a);
The o modifier specifies the byte order. i inside the square brackets indicates where to start in the str array.
Upvotes: 0
Reputation: 437386
It's an endianness issue. When you interpret the char*
as an int*
the first byte of the string becomes the least significant byte of the integer (because you ran this code on x86 which is little endian), while with the manual conversion the first byte becomes the most significant.
To put this into pictures, this is the source array:
a b c \0
+------+------+------+------+
| 0x61 | 0x62 | 0x63 | 0x00 | <---- bytes in memory
+------+------+------+------+
When these bytes are interpreted as an integer in a little endian architecture the result is 0x00636261
, which is decimal 6513249. On the other hand, placing each byte manually yields 0x61626300
-- decimal 1633837824.
Of course treating a char*
as an int*
is undefined behavior, so the difference is not important in practice because you are not really allowed to use the first conversion. There is however a way to achieve the same result, which is called type punning:
union {
char str[4];
unsigned int ui;
} u;
strcpy(u.str, "abc");
printf("%u\n", u.ui);
Upvotes: 15
Reputation: 222754
Neither of the first two is correct.
The first violates aliasing rules and may fail because the address of str
is not properly aligned for an unsigned int
. To reinterpret the bytes of a string as an unsigned int
with the host system byte order, you may copy it with memcpy
:
unsigned int a; memcpy(&a, &str, sizeof a);
(Presuming the size of an unsigned int
and the size of str
are the same.)
The second may fail with integer overflow because str[0]
is promoted to an int
, so str[0]<<24
has type int
, but the value required by the shift may be larger than is representable in an int
. To remedy this, use:
unsigned int b = (unsigned int) str[0] << 24 | …;
This second method interprets the bytes from str
in big-endian order, regardless of the order of bytes in an unsigned int
in the host system.
Upvotes: 6
Reputation: 12427
Both are correct in a way:
Your first solution copies in native byte order (i.e. the byte order the CPU uses) and thus may give different results depending on the type of CPU.
Your second solution copies in big endian byte order (i.e. most significant byte at lowest address) no matter what the CPU uses. It will yield the same value on all types of CPUs.
What is correct depends on how the original data (array of char) is meant to be interpreted.
E.g. Java code (class files) always use big endian byte order (no matter what the CPU is using). So if you want to read int
s from a Java class file you have to use the second way. In other cases you might want to use the CPU dependent way (I think Matlab writes int
s in native byte order into files, c.f. this question).
Upvotes: 1
Reputation:
You said you want to copy byte-by-byte.
That means the the line unsigned int a = *(unsigned int*)str;
is not allowed. However, what you're doing is a fairly common way of reading an array as a different type (such as when you're reading a stream from disk.
It just needs some tweaking:
char * str ="abc";
int i;
unsigned a;
char * c = (char * )&a;
for(i = 0; i < sizeof(unsigned); i++){
c[i] = str[i];
}
printf("%d\n", a);
Bear in mind, the data you're reading may not share the same endianness as the machine you're reading from. This might help:
void
changeEndian32(void * data)
{
uint8_t * cp = (uint8_t *) data;
union
{
uint32_t word;
uint8_t bytes[4];
}temp;
temp.bytes[0] = cp[3];
temp.bytes[1] = cp[2];
temp.bytes[2] = cp[1];
temp.bytes[3] = cp[0];
*((uint32_t *)data) = temp.word;
}
Upvotes: 1
Reputation: 145839
unsigned int a = *(unsigned int*)str;
This initialization is not correct and invokes undefined behavior. It violates C aliasing rules an potentially violates processor alignment.
Upvotes: 1