Union fields in .NET - can they really work in managed code?

Question

I defined a struct like this in C#

[StructLayout(LayoutKind.Explicit)]
public struct MyUnion
{
    [FieldOffset(0)]
    public string MyString;
    [FieldOffset(0)]
    public Version MyVersion;
}

According to the documentation for [FieldOffset] it affects the unmanaged representation of the struct. But to my surprise it seems to work just as well in managed code: when I profile the memory usage in dotTrace each MyUnion instance is the size of one pointer (8 bytes on x64)! The values still seem to be perfectly safe, too:

var stringInside = new MyUnion { MyString = "The string" };
var versionInside = new MyUnion { MyVersion = new Version(1, 2, 3, 4) };
Console.WriteLine(stringInside.MyString); // The string
Console.WriteLine(versionInside.MyVersion); // 1.2.3.4

But wait, what if I access the wrong field?

var whatIsThis = stringInside.MyVersion;
var andThis = versionInside.MyString;
Console.WriteLine("{0} (type = {1})", whatIsThis, whatIsThis.GetType().FullName); // The string (type = System.String)
Console.WriteLine("{0} (type = {1})", andThis, andThis.GetType().FullName); // 1.2.3.4 (type = System.Version)

This still "works" in the sense that the real type of the contained object is preserved, but of course there is now a disconnect between what a compiler thinks and what the runtime thinks, e.g.

Console.WriteLine("Compiler: is it a string? {0}", versionInside.MyString is string); // True
Console.WriteLine("Runtime: is it a version? {0}", versionInside.MyString.GetType() == typeof(Version)); // True

How dangerous is it to use unions like this? Can I rely on the behaviour I see here? Is it likely to break in some other ways? In particular, is it safe to use code like this?

if (versionInside.MyString.GetType() == typeof(string))
{
    Console.WriteLine("OK, it's a string, use the MyString field");
}
else
{
    Console.WriteLine("OK, it's a Version, use the MyVersion field");
}

Hans Passant · Accepted Answer

That is just fine. The only scenario that is not supported is overlapping a value type field with a reference type field. Now the GC can no longer reliably determine whether or not the value contains an object reference. The CLR slams the emergency stop early, you'll get a TypeLoadException.

The more general form of such a union is a discriminated union. The variant type is the canonical example. It has another field that indicates the type of a field. You in effect already have this in your example, every object has an otherwise hidden field that indicates its type. Known as the "type handle" or "method-table pointer". Object.GetType() uses it. And the field that the garbage collector uses to discover the actual type of the object, the declared type is not useful since it could be a base class or interface.

You'll inevitably run into trouble when you overlap two value type values, now you can't know the actual type anymore if you don't have another field that tells you. If you use the wrong one then you'll just read garbage. Writing cannot cause memory corruption, the structure is large enough to contain the largest type. That kind of trouble is never that hard to diagnose, or to predict.

Union fields in .NET - can they really work in managed code?

Answers (1)

Related Questions