gowrath
gowrath

Reputation: 3224

Using Contiguous Memory of C Struct Members

Before you mark this as duplicate, please do read the question.

So this may be a potentially very stupid question but it is bothering me. I know, from reading, as well as many other SO questions that fields in a struct in C are not guaranteed to be contiguous due to padding added by the compiler. For example, according to the C standard:

13/ Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.

I was working on writing a program similar to the unix readelf and nm just for fun and it involves a lot of work with dealing with bytes at specific offsets into the file to read certain values. For example, the first 62 bytes of an object file contains the "file header". The file header's bytes 0x00-0x04 encode an int, while 0x20-0x28 encode a pointer etc. However, I noticed in the original implementation of readelf.c that the programmer does something like this:

First, they declare a struct (lets call it ELF_H) with fields corresponding to the things in the file header (i.e. the first field is an int just like the first 4 bytes in the file header are, the second is a char because bytes 0x04-0x05 in the elf header encode a char etc.). Then what they do is copy the entire elf file to memory and type case the pointer that points to the start of this memory into type ELF_H. Something like:

FILE *file = fopen('filename', rb);
void *start_of_file = malloc(/* size_of_file */);
fread(start_of_file, 1, /* size_of_file */,file);  // copies entire file into memory
ELF_H hdr = *(ELF_H) start_of_file;               // type case pointer to be of type struct and dereference

and after doing this, just access each section of the header by using the member variables of the struct. So instead of getting what is supposed to be at byte 0x04 using pointer arithmetic, they just do hdr.member2 (which in the struct is the second member followed by the first one which was an int).

How is this meant to work if fields in a struct aren't guaranteed to be contiguous?

The closest answer I could find to this was here but in that example, the members of the struct are of the same type. In the ELF_H, they are of different types.

Thank you in advance :)

Upvotes: 4

Views: 3327

Answers (5)

gowrath
gowrath

Reputation: 3224

So interestingly enough, I found an answer to this in the reference specification (page 2) for elf's.

According to it:

All data structures that the object file format defines follow the "natural'' size and alignment guidelines for the relevant class. If necessary, data structures contain explicit padding to ensure 4-byte alignment for 4-byte objects, to force structure sizes to a multiple of 4, and so on. Data also have suitable alignment from the beginning of the file. Thus, for example, a structure containing an Elf32_Addr member will be aligned on a 4-byte boundary within the file.

This is specifically for a 32 bit architecture, but I am sure the same concept applies to 64 bit systems. So it seems, all the data structures defined for ELFs are made in a way to allow for alignment such that a struct could represent them.

Thank you all for your answers; they were exceptionally helpful!

Upvotes: 1

David
David

Reputation: 1694

How is this meant to work if fields in a struct aren't guaranteed to be contiguous?

The standard doesn't require structs to be contiguous, but this doesn't mean that structs are laid out at random or in unpredictable ways. The specific compiler and linker being used will always generate the binary in a specified way, as dictated by the Application Binary Interface or ABI. It just so happens that on a GNU/Linux machine, the ELF ABI exactly corresponds to how GCC will lay out and access that struct.

In other words, you can predict whether the method you describe will work for any given ABI / compiler / linker combination. It's not guaranteed to work by the standard, but it might be guaranteed to work by the compatibility of ABIs.

Upvotes: 2

Karim Manaouil
Karim Manaouil

Reputation: 1249

To be fast, the padding is only added to get 32/64-bits aligned values, and by looking at the structures in elf.h (Elf header structure, program headers structure, section headers structure), you will notice that the values are already aligned according to their architecture, therefore, you can copy content from file to memory and typecast the buffer to whatever structure you want & then access values from within.

It happens (& by coincidence) that I'm developing a tool as yours (I'm trying to make a tool that combines the functionalities of Readelf & Objdump). I've made a significant progress and I'm willing to share the project on GitHub.
You may want to join me to develop it further more (Contact me on [email protected]).

Upvotes: 0

ShadowRanger
ShadowRanger

Reputation: 155497

If the data in the file was written from a padded struct of the form being read, then the padding is irrelevant; the file contains padding as does the memory representation.

It's true the standard isn't particularly restrictive, and a compiler could insert random padding in the ELF reader struct that the tool that wrote the ELF didn't match. But in practice, the "unnamed padding" is for alignment purposes, and all major compilers have predictable behavior there; they pad to align the fields to match their type. So int fields (on systems with four byte int) are preceded by 1-3 pad bytes if the previous field didn't end on a four byte boundary, char fields get no padding, etc. In this case, no compiler I know of would insert padding between a leading int field and a following char[2], because char has no required alignment anyway.

It's also possible to use non-standard compiler extensions to prevent padding to align fields in the struct, but it's not necessary if your struct definition would never have an unaligned field anyway (because you always put smaller fields after larger fields, or because you always group smaller fields together to maintain the alignment requirements of subsequent larger fields).

Upvotes: 3

Daniel Tran
Daniel Tran

Reputation: 6169

You can make field of a struct contiguous by disable struct padding. For gcc it should be:

typedef struct Port
{
    uint32_t reg0;
    uint32_t reg1;
    uint32_t reg2;
} __attribute__((__packed__));

For VS C++:

#pragma pack(push, 1)

typedef struct Port
{
    uint32_t reg0;
    uint32_t reg1;
    uint32_t reg2;
};

#pragma pop();

Upvotes: 1

Related Questions