Reputation: 449
I read that in Unions, the data members occupy the same block of memory. So, I tried to read off ASCII codes of the English Alphabet using this implementation.
union {
int i;
char a,b;
}eps;
eps.i=65;
cout<<eps.a<<eps.b;
I got the right output (A
) for 65 but, both a
and b
seem to occupy the same place in the memory.
Q. But an integer being 2 bytes, shouldn't a
have occupied the first 8 bits and b
the other 8 ?
Also, while repeating the above with multiple integers inside the union, they seem to have the same value.
Q. So does that mean that every variable of a given data type acts like a reference for any other variable for the same data type? (Given simple adding on the variables int i,j,k,l.....
)
Q. Can we only use one (distinct) variable of a given datatype in a union since all others point at the same location?
EDIT
I would like to mention that while adding on any more variables inside the union, it simply means adding them like int i,j,k...
not using wrapping them inside struct
or in any other way.
As Clarified by Baum mit in the chat (and comments), Here's the discussion for other/future users to see.
Upvotes: 2
Views: 4815
Reputation: 14750
Recall that an union
type is a set of alternative possibilities. The formal wording is that it's the co-product of all the types its fields belong to.
union {
int i;
char a,b;
}
is syntactically equivalent to:
union {
int i;
char a;
char b;
}
a
and b
being of the same type, they don't contribute more together than each other taken individually. In other words, b
is redundant.
You need to wrap the a
and b
fields in a struct
to get them bundled as one alternative of the union
.
union {
int i;
struct {
char a;
char b;
};
}
Furthermore, the int
type is on most platforms a 32 bits wide integral type, and char
a 8 bit wide integral type — I say usually, because the sizes are not formally defined more than just in terms of int
being larger or equal to char
.
So, assuming we have the usual definitions for char
and int
, the second alternative being 16 bit wide, the compiler has the opportunity to place it where it wants within the same space occupied by the larger field (32 bits).
Another issue is the byte ordering which could be different from one platform to the next.
You could perhaps get it to work (and in practice it almost always works) by padding the struct with the missing bytes to reach 32 bits:
union {
int i;
struct {
char a;
char b;
char c;
char d;
};
}
(think of a int
representation of an IPv4 address for instance, and the htons
function to cover the byte ordering issue).
The definitive rule however is dictated by the C language specifications, which don't specify that point.
To be on the safe side, rather than using an union
, I would go for a set of functions to pull out bytes by bit masking, but if you are targeting a specific platform, and the above union works...
Upvotes: 2
Reputation: 50101
Reading a member of a union
that is not the one you last wrote to is undefined behavior. This means that your code could do anything and arguing about its behavior is not meaningful.
To perform conversion between types, use the appropriate cast, not a union.
To answer your questions after the edit:
Q. But an integer being 2 bytes, shouldn't a have occupied the first 8 bits and b the other 8 ?
As you said, every member of the union shares the same space. Since a
and b
are different members, they share the same space too (in the sense that they both live somewhere in the space belonging to the union). The actual layout of the union might look like this:
byte 0 | byte 1 | byte 2 | byte 3
i i i i
a
b
Q. So does that mean that every variable of a given data type acts as a reference for any other variable for the same data type?
No, members of the same time do not act as references to one another. If your have a reference to an object, you can reliably access that object through the reference. Two members of the same type will probably use the exact same memory, but you cannot rely on that. The rule I stated above still applies.
Q. Can we only use one (distinct) variable of a given datatype in a union since all others point at the same location?
You can have as many members of the same type as you want. They might or might not live in the exact same memory. It does not matter because you can only access the last one written to anyways.
Upvotes: 6
Reputation: 46339
You have misunderstood what unions are for. They make no guarantees about sharing memory in any predictable way. They simply provide a way to say an entity could store one of several types of information. If you set one type, the others are undefined (could be anything, even something unrelated to the data you put in). How they share memory is up to the compiler, and could depend on optimisations which are enabled (e.g. memory alignment rules).
Having said all that, in most situations (and with optimisations disabled), you will find that each part of a union begins at byte 0 of the union (do not rely on this). In your code, union{int i;char a,b;}
says "this union could be an integer i, or a char a, or a char b". You could use (as many have suggested), union{int i;struct{char a,b;}}
, which would tell the compiler: "this union could be an integer i, or it could be a structure of characters a and b".
Casting from one type to another, or to its component bytes, therefore is not a job for unions. Instead you should use casts.
So where would you use a union? Here's an example:
struct {
int type; // maybe 0 = int, 1 = long, ...
union {
char c;
int i;
long l;
float f;
double d;
struct {
int x;
int y;
} pos;
// etc.
} value;
};
With an object like that, we can dynamically store numbers of any type (or whatever else we might want, like 2D position in this example), while keeping track of what's actually there using an external variable. It uses much less memory than the equivalent code would without a union, and makes setting/getting safe (we don't need to cast pointers all over the place)
Upvotes: 3