Some Name
Some Name

Reputation: 9521

Initializing union of two structs with common initial sequnce

QUESTION: If union contains two structs with common initial sequnce of compatible types is the behavior well defined if we initialize some part of the initial sequence using one struct and the rest of the part of the initial sequence using another struct?

Consider the following code fragment:

union u_t{
    struct {
        int i1;
        int i2;
    } s1;

    struct {
        int j1;
        int j2;
    } s2;
};

int main(){
    union u_t *u_ptr = malloc(sizeof(*u_ptr));
    u_ptr -> s1.i1 = 10;
    u_ptr -> s2.j2 = 11;

    printf("%d\n", u_ptr -> s2.j1 + u_ptr -> s1.i2); //prints 21
}

DEMO

The question is if the "printing 21" behavior is well-defined. The Standard N1570 6.5.2.3(p6) specifies the following:

if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible.

So it would be ok to inspect the common initial sequence (in this case the whole struct). But the problem is that in this case the union seems to contain the s2 object with j2 being the only initialized member.

I think we end up unspecified behavior since we only initialized s2.j2 while s2.j1 did not so it should contain unspecified value.

Upvotes: 3

Views: 333

Answers (2)

P.W
P.W

Reputation: 26800

The C11 standard (n1570) states in the footnote on [6.5 Expressions]/6 that:

Allocated objects have no declared type.

And [6.5 Expressions]/6 states that:

6 The effective type of an object for an access to its stored value is the declared type of the object, if any. If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value.

And you are also following the stated rules in [6.5 Expressions]/7 when you access the stored values for printing in the printf statement.

This combined with the citation you have provided from N1570 6.5.2.3(p6) which provides "One special guarantee is made in order to simplify the use of unions" makes this well-defined.

On the practical side, if you look at the assembly generated, you will find that this is what is actually happening.

        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     eax, 8
        mov     edi, eax
        call    malloc
        mov     qword ptr [rbp - 8], rax //Here
        mov     rax, qword ptr [rbp - 8] //Here
        mov     dword ptr [rax], 10      //Here 
        mov     rax, qword ptr [rbp - 8] //Here
        mov     dword ptr [rax + 4], 11  //Here 
        mov     rax, qword ptr [rbp - 8]
        mov     ecx, dword ptr [rax]
        mov     rax, qword ptr [rbp - 8]
        add     ecx, dword ptr [rax + 4]
        movabs  rdi, offset .L.str
        mov     esi, ecx
        mov     al, 0
        call    printf
        xor     ecx, ecx
        mov     dword ptr [rbp - 12], eax # 4-byte Spill
        mov     eax, ecx
        add     rsp, 16
        pop     rbp
        ret
.L.str:
        .asciz  "%d\n"

Upvotes: 1

Lundin
Lundin

Reputation: 213701

Regarding aliasing:

The common initial sequence is only concerned with aliasing of the two struct types. That's not a problem here and your two structs are even compatible types and therefore pointers to them may alias without using any tricks. Dissecting C11 6.2.7:

6.2.7 Compatible type and composite type
Two types have compatible type if their types are the same. /--/ Moreover, two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements:

If one is declared with a tag, the other shall be declared with the same tag.

Neither struct is declared with a tag here.

If both are completed anywhere within their respective translation units, then the following additional requirements apply:

They are both completed (defined).

there shall be a one-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types;

This holds true for these structs.

if one member of the pair is declared with an alignment specifier, the other is declared with an equivalent alignment specifier; and if one member of the pair is declared with a name, the other is declared with the same name.

Alignment specifiers do not apply.

For two structures, corresponding members shall be declared in the same order.

This holds true.

The conclusion is that your both structs are of compatible types. Meaning that you don't need any tricks like common initial sequence. The strict aliasing rule simply states (6.5/7):

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object,

This is the case here.

Furthermore, as mentioned in other answers, the effective type of the actual data here is int, since allocated storage yields no effective type and so it becomes the first type used for lvalue access. This too means that the compiler cannot assume that the pointers won't alias.

Furthermore, the strict aliasing rule gives an exception for lvalue access of members of structs and unions:

an aggregate or union type that includes one of the aforementioned types among its members

And then you have the common initial sequence on top of that. As far as aliasing goes, this is as well-defined as can be.


Regarding type punning:

Your actual concern does not seem to be aliasing, but type punning through unions. This is vaguely guaranteed by C11 6.5.2.3/3:

A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member,95) and is an lvalue if the first expression is an lvalue.

That's the normative text and it's badly written - nobody can understand how programs/compilers are supposed to behave based on this. The informative foot note 95) explains it well:

95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

Meaning in your case, you trigger a type conversion from one struct type to another compatible struct type. That's perfectly safe since they are the very same type and issues with alignment or traps do not apply.

Please note that C++ is different here.

Upvotes: 4

Related Questions