user12553460
user12553460

Reputation:

Why do we need to declare a variable of union type when nested in a structure in C?

I have a code sample from a tutorial, which says

struct goods {
    char name[20];
    union quantity {
        int count;
        float weight, volume;
    } q;
};

I cant figure out why do we need to declare 'q' variable along with a union type name 'quantity'? Why can't we get away with just 'quantity' and then access struct fields via dot?


Update: Is it correct that 'quantity' is a name/tag of a type union, while 'q' is not a variable but rather a name of a union member/field which contains sub-members (count, weight, volume)?

Upvotes: 0

Views: 821

Answers (2)

Eric Postpischil
Eric Postpischil

Reputation: 222536

It is not clear what specific question you are asking about this code, so let’s review the issues.

Members of Structures

When a union declaration appears inside a struct declaration, it is usually declaring a member of that structure that is a union. This member is a part of the structure the same as any other member, such as an int x declared in the structure. Every instance of the structure contains an instance of each of its members, including the union—the union is part of the structure, not a separate thing.

Names

In this code:

    union quantity {
        int count;
        float weight, volume;
    } q;

the identifier quantity is a tag for the union. In this role, it must appear after a union keyword, always as union quantity. It only names the union type; it does not name any union object or member of a structure. (The same identifier can be used in multiple roles. We could also add a declaration that defined quantity to be a type or an object or member, and then it would have two roles: It could be used as union quantity to refer to the union type, and it could be used by itself to refer to whatever the other declaration declared.)

In the same code above, q is the name of a member of a structure. It is the name for that union quantity object that is in each instance of the struct goods.

With this declaration, if we define a struct goods G;, then G.q refers to the union quantity that is in G, and G.q.count, G.q.weight, and G.q.volume refer to the members in the union G.q. (Only one of those members can be stored at a time, because they all overlap in a union.)

Anonymous Unions

In C 2011, a new feature was added. A union or structure could be declared inside another union or structure without a member name:

struct goods {
    char name[20];
    union {
        int count;
        float weight, volume;
    };
};

This does not change the layout of the structure at all—it still has the same members. However, their names are different. Given a struct goods G, we can refer to the count member as G.count instead of G.q.count, and similarly for weight and volume. (Note that, in addition to removing the member name q, this code also removed the tag quantity. There is a rule in the C standard that says that for a structure or union to be anonymous, it must not have a tag as well as not having a member name. I do not see a technical reason for this. Perhaps it was a choice to avoid errors where member names are inadvertently left out.)

As to why somebody might give a union member a name rather than make it anonymous, one reason is the code was written prior to 2011, or after 2011 but to be used in C implementations that did not yet support anonymous members. Another reason is that they wanted to distinguish the union members so anybody reading or writing the code would be alert to the fact that these members were inside something inside the structure, not regular direct structure members.

Upvotes: 3

John Bollinger
John Bollinger

Reputation: 180181

I cant figure out why do we need to declare 'q' variable along with a union type name 'quantity'?

As presented in the question, struct goods is a structure type with two members: an array of 20 char identified by name and a union quantity identified by q (so yes, quantity is a union tag, not the name of a member). There is no need in any absolute sense to declare it that way, but such a declaration provides a few characteristics that other alternatives do not. Do understand, however, that as declared in the example, count, weight, and volume are not members of struct goods . Rather, they are members of q, a union that is a member of struct goods.

Why can't we get away with just 'quantity' and then access struct fields via dot?

Because that is not one of the alternatives that C syntax affords. In the member list of a structure type declaration, a union tag (quantity in this case) can appear only in the declaration of a named member, so if that is provided then you must also declare an identifier for the union -- q in the example. And having declared the union as a named member, you must access its members via the union's identifier.

On the other hand, you may omit the tag, and if you do, then, optionally, you may also omit the union's identifier. If you do omit the identifier (and only in that case), you have an "anonymous union member" whose own members are accessed as if they were actually members of the containing structure. That's pretty close to what you ask.

Do note that any way around, the members of the union share storage with each other, so the union contains only one of them at any given time. They do not share storage with other members of the containing structure.

With that said, the various options do have some differences in their characteristics. In the first place, do appreciate that all of these forms have twofold significance: they declare a union type, and they declare a structure member of that type. That's relevant because if you provide a tag then you can declare other objects of the same union type wherever the union declaration is in scope. Moreover, that scope is not limited to the structure type declaration that contains it, so with the declaration presented, one could do something like this:

void set_quantity(struct goods *g, union quantity quant) {
    g->q = quant;
}

That is impossible for untagged unions.

There is also at least one important distinction between a named member with an untagged union type and an anonymous union member: you can access the union itself only if it is named. Consider this:

struct goods2 {
    char name[20];
    union {
        int count;
        float weight, volume;
    } q;
};

void copy_quantity(struct goods2 *dest, struct goods2 *src) {
    dest->q = src->q;
}

Not only can you not do that with an anonymous union member, you cannot do anything reliably equivalent. In particular, even if you were willing to suffer the inefficiency that would be associated with copying src->count, src->weight, and src->volume individually despite only one of them actually containing a value, C provides no promise that doing so in any order would reliably achieve the desired result.

Upvotes: 0

Related Questions