David Whitten
David Whitten

Reputation: 199

C++ Unions Internals

I'm trying to learn mor eabout C++ and ran into some code in a chess program that I need help understanding. I have a union such as:

union b_union {

    Bitboard b;
    struct {
    #if defined (BIGENDIAN)
        uint32_t h;
        uint32_t l;
    #else
        uint32_t l;
        uint32_t h;
    #endif
    } dw;
};

The above code falls into the else condition.

Bitboard is defined as uint64_t. If I have a value let's say 0x0025f780, which is 282578800148862, and I set union.b = 0x0025f780, then union.dw.l is updated to 16843134 and union.dw.h is updated to 65793. Intitially l and h start off with 3435973836. Internally, what happened? I'm fairly new to C++. Just trying to wrap my head around unions in how they work internally.

Thanks much for any insights.

David

Upvotes: 1

Views: 341

Answers (4)

Mysticial
Mysticial

Reputation: 471567

The union means that the components will occupy the same memory location. In the code sample that you have shown, the intent is to allow you to reference the upper and lower 32-bits of the b directly.

Note that this code invokes undefined (or implementation defined) behavior. This is because you are accessing a union element from a different element with which the data is written to.

So b which is a 64-bit integer, will share the same memory location as l and h which refer to the lower and upper 32-bits. Of course, the validity of this depends on the endian of the machine - which is why there is the preprocessor if-else.

EDIT: Your particular example is also not correct. But here's a fixed version:

When you set b = 282578800148862, (b = 0x101010101017e). The upper and lower 32-bits are:

00010101 0101017e

so

l = 0x0101017e = 16843134
h = 0x00010101 = 65793

Upvotes: 5

Lindydancer
Lindydancer

Reputation: 26164

Basically, a union lets you describe several ways a single piece of memory is used. The normal case is to store two unrelated values at the same location, which work as long as you use only one at the time. (Writing to one variant destroys the other.)

Another very common use of unions is to access parts of another element (which, by the way, is undefined behavior). In your case, two views of a 64 bit integer. One is the entire integer and the other is the two halves, as separate 32 bit entities.

Note that different computers store a 64 bit value differently. Some store the bytes from more valued to less valued (big endian), some the other way around (little endian), and some use a mixed form (mixed endian). The names, by the way, comes from The Travels of Gulliver, where some people ate the egg from the big side and some from the pointy side.

In your case, I would suggest that you drop the union all together and access the parts using:

low = uint32_t(b);
high = uint32_t(b >> 32);

The above will work on all architectures and is as fast, or even faster, that the union.

Upvotes: 1

littleadv
littleadv

Reputation: 20282

You'd be better dealing with hex numbers in this case.

What happens is that the union dw and uint64_t b occupy the same space in memory. The l and h represent the low and high 32-bit portions of b.

In big-endian, the high 32 bit portion is also the higher bits when the value is in the memory. In little-endian its exactly the opposite. That's why you have the #ifdef there.

This makes l the low 32 bits of b (0xf780) and h - the high 32 bits of b (0x0025).

The actual values that you mentioned don't make much sense, and you probably have some other issue there. 282578800148862 is not 0x0025f780.

You have to be careful with unions because the underlying data representation may be different. For example, your struct might be aligned and thus the actual memory locations of l and h won't be where you expect them to be. You need to disable alignment to ensure it doesn't happen.

Upvotes: 0

Marc DiMillo
Marc DiMillo

Reputation: 513

Unions are declared with only one value at a time. It can "declare" multiple values but only hold one at a time, and the previous is overwritten. In your case, union.b set the value but assigned it to the other variables. You can't hold the BitBoard value, and the struct value, it needs to be one or the other. So when you went to go check back, you had already overwritten your old values. I think a struct is better suited in this scenario, but you can always try stepping through the code if you are unsure. Here, your l and h values started to merge with the bitboard which caused problems.

Upvotes: 1

Related Questions