antagon
antagon

Reputation: 129

Strange pointer behaviour

I have a buffer containing data which was previously read from socket. From the information stored in the buffer, I can tell the entire data length (91 Bytes) and from specification, I know the positions of other information I need to retrieve (32bit integer and 16bit integer, lets call them uid and suid).

unsigned char buffer[1024];
uint32_t uid;
uint16_t suid;

uid = ntohl ( *((uint32_t*) (buffer + sizeof (struct pktheader))) );

suid = ntohs ( *((uint16_t*) (buffer + sizeof (struct pktheader) + sizeof (uint32_t))) );

This code was cross-compiled for ARM and for some unexpected reasons content of uid was filled with incorrect bytes which were part of the buffer but were residing before (!) the content I meant to retrieve. As if the offset was calculated incorrectly. Strangely enough content of suid was just fine.

Can you explain how is this behaviour even possible? I know it may be difficult from information I provided... We can rule out incorrect value of sizeof (struct pktheader), I have double checked. The content in the buffer is as defined in specification. I even found a working solution using memcpy using same offset calculation to get each part out, so we can pretty much rule out a possibility of mingled data.

I discussed it with my colleague and his professional guess was that some auto-alignment behaviors happened pointing out that offset was just 2 Bytes off. However I would like to know more.

Until now I was quite fond of this construction in order to access individual parts stored in buffers.

Upvotes: 0

Views: 103

Answers (1)

marko
marko

Reputation: 9159

This is almost certainly an alignment issue as @Dark Falcon suggests. Most likely, the CPU ignored the bottom two bits of the address and performed the load aligned.

Whilst not supporting unaligned loads might seem like an odd design choice, it's mostly to do with gate count and power consumption. There are two nasty scenarios that the CPU would need to cope with otherwise:

  • The load/store straddles a cache-line boundary
  • The load/store straddles a page boundary

Both of these require a substantial number of gates to fix up - particularly the latter - as the memory access in fact straddles both the cache line and a page boundary. Gates, die real estate and power consumption are at a premium in most ARM parts.

This behaviour is also entirely compatible with the C and C++ standards, so the code above is not universally portable.

Overlaying a struct or union over the buffer with unaligned accesses is not safe either - the compiler will lay it so as to avoid unaligned accesses and will also not initialise any gaps left (so never use memcmp to compare them). It is also precisely what you don't want for wire-format packets.

Using structure packing is safer - the compiler will know what memory accesses are permitted and which are not and will generate smaller reads and writes so as not to perform unaligned accesses. However, until recently, the mechanism for enabling packing has been compiler specific - so not portable either.

The only truly portable implementation choice is byte-wise accesses (and shifts to coalesce the data).

Upvotes: 1

Related Questions