sunnyleevip
sunnyleevip

Reputation: 164

How to define struct 'Atom' in C? Which is better? Why?

ATOM means a pointer, which point to exclusive and constant string. A string in 'C' should be end with '\0'.

I will show two ways to define an 'ATOM TABLE' structure in 'C':

struct atom1 {
    struct atom1 *link;
    int len;
    char *str;
} *bucket[2048]

and

struct atom2 {
    struct atom2 *link;
    int len;
    char str[1];
} *bucket[2048]

So, when I want to allocate memory for these two type of ATOM, I also have two ways.

// memory + 1 for '\0'
struct atom1 *p = malloc(sizeof(*p) + len + 1);

and

// memory for '\0' is already in the define of struct atom
struct atom2 *p = malloc(sizeof(*p) + len);

So we can see, when we want to allocate memory, 'atom2' looks better. But on the other side, if we want to access the memory of the string, we will break the rule of 'C', because 'char str[1];' in 'atom2'.

Is 'atom2' really good?

Upvotes: 1

Views: 286

Answers (2)

davmac
davmac

Reputation: 20631

Since C99 you have the option of using flexible array members - that is, the same as your atom2 but without the array size specified:

struct atom2 {
    struct atom2 *link;
    int len;
    char str[];
} *bucket[2048];

This way you get the benefit of being able to allocate the string and structure together, without violating language rules.

In this case when you allocate memory, be sure to account for the nul string terminator:

struct atom2 *p = malloc(sizeof(*p) + len + 1);

(Note also you do not ever need to cast the result of malloc).

Note that your atom1 structure contains a char * which is semantically different to a char []. The pointer is a data member which occupies storage and which can be made to point anywhere, whereas the flexible array member does not occupy any storage (other than what you allocate for it explicitly) and always trails the rest of the object. To use atom1 you'd need to allocate storage for the struct object and the string separately:

struct atom1 *p = malloc(sizeof(*p));
p->str = malloc(len + 1);

The allocation you suggested in your question:

struct atom1 *p = malloc(sizeof(*p) + len + 1);

... would at least require that you set up the pointer, p->str, to point at the correct place (something like p->str = ((char *) p) + sizeof(*p)) but I'm not certain that you wouldn't be invoking undefined behaviour if you tried to store the string at that location.

Upvotes: 5

Lundin
Lundin

Reputation: 213920

atom1 doesn't make any sense, because you should allocate memory dynamically for what str points at, not for the whole struct. As the code currently stands, there is no sound way in which you will be able to use atom1.

atom2 invokes undefined behavior. This was known as the "struct hack" in the old C standard, and was never guaranteed to work. Writing out of bounds of the fixed array is not allowed, even though you may have allocated data at the end of the struct. Because you don't know where the struct ends: it could have padding bytes.

Is 'atom2' really good?

Neither method is good, don't use either of them. In modern C, you can do this in a safe manner, by using a flexible array member:

typedef struct atom3 
{
  struct atom3* link;
  size_t        lenght;
  char          str[];
} atom3_t;

And then allocate memory as:

atom3_t* p = malloc(sizeof(*p) + length + 1);

After that, you can safely use str as if it was any array with size length + 1.

Upvotes: 0

Related Questions