Reputation: 9540
I'm trying to understand how type-punning works when it comes to storing a value into a member of structure or union.
The Standard N1570 6.2.6.1(p6)
specifies that
When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.
So I interpreted it as if we have an object to store into a member such that the size of the object equals the sizeof(declared_type_of_the_member) + padding
the bytes related to padding will have unspecified value (even in spite of the fact that we had the bytes in the original object defined). Here is an example:
struct first_member_padded_t{
int a;
long b;
};
int a = 10;
struct first_member_padded_t s;
char repr[offsetof(struct first_member_padded_t, b)] = //some value
memcpy(repr, &a, sizeof(a));
memcpy(&(s.a), repr, sizeof(repr));
s.b = 100;
printf("%d%ld\n", s.a, s.b); //prints 10100
On my machine sizeof(int) = 4, offsetof(struct first_member_padded_t, b) = 8
.
Is the behavior of printing 10100
well defined for such a program? I thing that it is.
Upvotes: 3
Views: 179
Reputation: 81247
In many implementations of the language the C Standard was written to describe, an attempt to write an N-byte object within a struct or union would affect the value of at most N bytes within the struct or union. On the other hand, on a platform which supported 8-bit and 32-bit stores, but not 16-bit stores, if someone declared a type like:
struct S { uint32_t x; uint16_t y;} *s;
and then executed s->y = 23;
without caring about what happened to the two bytes following y
, it would be faster to performs a 32-bit store to y
, blindly overwriting the two bytes following it, than to perform a pair of 8-bit writes to update the upper and lower halves of y
. The authors of the Standard didn't want to forbid such treatment.
It would have been helpful if the Standard had included a means by which implementations could indicate whether writes to structure or union members might disturb storage beyond them, and programs that would be broken by such disturbance could refuse to run on implementations where it could occur. The authors of the Standard, however, likely expected that programmers who would be interested in such details would know what kinds of hardware their program was expected to run on, and thus know whether such memory disturbances would be an issue on such hardware.
Unfortunately, modern compiler writers seem to interpret freedoms that were intended to assist implementations for unusual hardware as an open invitation to get "creative" even when targeting platforms that could process code efficiently without such concessions.
Upvotes: 1
Reputation: 223702
The question is poorly posed. Let’s look first at the code:
char repr[offsetof(struct first_member_padded_t, b)] = //some value
memcpy(repr, &a, sizeof(a));
memcpy(&(s.a), repr, sizeof(repr));
First note that repr
is initialized, so all the elements in it are given values.
The first memcpy
is fine—it copies the bytes of a
into repr
.
If the second memcpy
were memcpy(&s, repr, sizeof repr);
, it would copy bytes from repr
into s
. This would write bytes into s.a
and, due to the size of repr
, into any padding between s.a
and s.b
. Per C 2018 6.5 7 and other pats of the standard, it is permitted to access the bytes of an object (and “access” means both reading and writing, per 3.1 1). So this copy into s
is fine, and it results in s.a
taking on the same value that a
has.
However, the memcpy
uses &(s.a)
rather than &s
. It uses the address of s.a
rather than the address of s
. We know that converting s.a
to a pointer to a character type would allow us to access the bytes of s.a
(6.5 7 and more) (and passing it to memcpy
has the same effect as such a conversion, as memcpy
is specified to have the effect of copying bytes), but it is not clear it allows us to access other bytes in s
. In other words, we have a question of whether we can use &s.a
to access bytes other than those in s.a
.
6.7.2.1 15 tells us that, if a pointer to the first member of a structure is “suitably converted,” the result points to the structure. So, if we converted &s.a
to a pointer to struct first_member_padding_t
, it would point to s
, and we can certainly use a pointer to s
to access all the bytes in s
. Thus, this would also be well defined:
memcpy((struct first_member_padding t *) &s.a, repr, sizeof repr);
However, memcpy(&s.a, repr, sizeof repr);
only converts &s.a
to void *
(because memcpy
is declared to take a void *
, so &s.a
is automatically converted during the function call) and not to a pointer to the structure type. Is that a suitable conversion? Note that if we did memcpy(&s, repr, sizeof repr);
, it would convert &s
to void *
. 6.2.5 28 tells us that a pointer to void
has the same representation as a pointer to a character type. So consider these two statements:
memcpy(&s.a, repr, sizeof repr);
memcpy(&s, repr, sizeof repr);
Both of these statements pass a void *
to memcpy
, and those two void *
have the same representation as each other and point to the same byte. Now, we might interpret the standard pedantically and strictly so that they are different in that the latter may be used to access all the bytes of s
and the former may not. Then it is bizarre that we have two necessarily identical pointers that behave differently.
Such a severe interpretation of the C standard seems possible in theory—the difference between the pointers could arise during optimization rather than in the actual implementation of memcpy
—but I am not aware of any compiler that would do this. Note that such an interpretation is at odds with section 6.2 of the standard, which tells us about types and representations. Interpreting the standard so that (void *) &s.a
and (void *) &s
behave differently means that two things with the same value and type may behave differently, which means a value consists of something more than its value and type, which does not seem to be the intent of 6.2 or the standard generally.
The question states:
I'm trying to understand how type-punning works when it comes to storing a value into a member of structure or union.
This is not type-punning as the term is commonly used. Technically, the code does access s.a
using lvalues of a different type than its definition (because it uses memcpy
, which is defined to copy as if with character type, while the defined type is int
), but the bytes originate in an int
and are copied without modification, and this sort of copying the bytes of an object is generally regarded as a mechanical procedure; it is done to effect a copy and not to reinterpret the bytes in a new type. “Type-punning” usually refers to using different lvalues for the purpose of reinterpreting the value, such as writing an unsigned int
and reading a float
.
In any case, type-punning is not really the subject of the question.
The title asks:
What values can we store in a struct or union members?
This title seems off from the content of the question. The title question is easily answered: The values we can store in a member are those values the member’s type can represent. But the question goes on to explore the padding between members. The padding does not affect the values in the members.
The question quotes the standard:
When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.
and says:
So I interpreted it as if we have an object to store into a member such that the size of the object equals the s
izeof(declared_type_of_the_member) + padding
the bytes related to padding will have unspecified value…
The quoted text in the standard means that, if the padding bytes in s
have been set to some values, as with memcpy
, and we then do s.a = something;
, then the padding bytes are no longer required to hold their previous values.
The code in the question explores a different situation. The code memcpy(&(s.a), repr, sizeof(repr));
does not store a value in a member of the structure in the sense meant in 6.2.6.1 6. It is not storing into either of the members s.a
or s.b
. It is copying bytes in, which is a different thing from what is discussed in 6.2.6.1.
6.2.6.1 6 means that, for example, if we execute this code:
char repr[sizeof s] = { 0 };
memcpy(&s, repr, sizeof s); // Set all the bytes of s to known values.
s.a = 0; // Store a value in a member.
memcpy(repr, &s, sizeof s); // Get all the bytes of s to examine them.
for (size_t i = sizeof s.a; i < offsetof(struct first_member_padding_t, b); ++i)
printf("Byte %zu = %d.\n", i, repr[i]);
then it is not necessarily true that all zeros will be printed—the bytes in the padding may have changed.
Upvotes: 3
Reputation: 4537
As @user694733 said, in case there is padding between s.a
and s.b
, memcpy()
is accessing a memory area that cannot be accessed by &a
:
int a = 1;
int b;
b = *((char *)&a + sizeof(int));
This is Undefined Behaviour, and it is basically what is happening inside memcpy()
.
Upvotes: 0