What is the Implementation of Generics for the NET Common Language Runtime

Question

When you use generic collections in C# (or .NET in general), does the compiler basically do the leg-work developers used to have to do of making a generic collection for a specific type. So basically . . . it just saves us work?

Now that I think about it, that can't be right. Because without generics, we used to have to make collections that used a non-generic array internally, and so there was boxing and unboxing (if it was a collection of value types), etc.

So, how are generics rendered in CIL? What is it doing to impliment when we say we want a generic collection of something? I don't necessarily want CIL code examples (though that would be ok), I want to know the concepts of how the compiler takes our generic collections and renders them.

Thanks!

P.S. I know that I could use ildasm to look at this but I CIL still looks like chinese to me, and I am not ready to tackle that. I just want the concepts of how C# (and other languages I guess too) render in CIL to handle generics.

user530189 · Accepted Answer

Forgive my verbose post, but this topic is quite broad. I'm going to attempt to describe what the C# compiler emits and how that's interpreted by the JIT compiler at runtime.

ECMA-335 (it's a really well written design document; check it out) is where it's at for knowing how everything, and I mean everything, is represented in a .NET assembly. There are a few related CLI metadata tables for generic information in an assembly:

GenericParam - Stores information about a generic parameter (index, flags, name, owning type/method).
GenericParamConstraint - Stores information about a generic parameter constraint (owning generic parameter, constraint type).
MethodSpec - Stores instantiated generic method signatures (e.g. Bar.Method for Bar.Method).
TypeSpec - Stores instantiated generic type signatures (.e.g. Bar for Bar).

So with this in mind, let's walk through a simple example using this class:

class Foo
{
    public T SomeProperty { get; set; }
}

When the C# compiler compiles this example, it will define Foo in the TypeDef metadata table, like it would for any other type. Unlike a non-generic type, it will also have an entry in the GenericParam table that will describe its generic parameter (index = 0, flags = ?, name = (index into String heap, "T"), owner = type "Foo").

One of the columns of data in the TypeDef table is the starting index into the MethodDef table that is the continuous list of methods defined on this type. For Foo, we've defined three methods: a getter and a setter to SomeProperty and a default constructor supplied by the compiler. As a result, the MethodDef table would hold a row for each of these methods. One of the important columns in the MethodDef table is the "Signature" column. This column stores a reference to a blob of bytes that describes the exact signature of the method. ECMA-335 goes into great detail about these metadata signature blobs, so I won't regurgitate that information here.

The method signature blob contains type information about the parameters as well as the return value. In our example, the setter takes a T and the getter returns a T. Well, what is a T then? In the signature blob, it's going to be a special value that means "the generic type parameter at index 0". This means the row in the GenericParams table that has index=0 with owner=type "Foo", which is our "T".

The same thing goes for the auto-property backing store field. Foo's entry in the TypeDef table will have a starting index into the Field table and the Field table has a "Signature" column. The field's signature will denote that the field's type is "the generic type parameter at index 0".

This is all well and good, but where does the code generation come into play when T is different types? It's actually the responsibility of the JIT compiler to generate the code for the generic instantiations and not the C# compiler.

Let's take a look at an example:

Foo f1 = new Foo(); 
f1.SomeProperty = 10;
Foo f2 = new Foo();
f2.SomeProperty = "hello";

This will compile to something like this CIL:

newobj  // new Foo()
stloc.0 // Store in local "f1"
ldloc.0 // Load local "f1"
ldc.i4.s 10 // Load a constant 32-bit integer with value 10
callvirt  // Call f1.set_SomeProperty(10)
newobj  // new Foo()
stloc.1 // Store in local "f2"
ldloc.1 // Load local "f2"
ldstr  // Load "hello" (which is in the user string heap)
callvirt  // Call f2.set_SomeProperty("hello")

So what's this MemberRefToken business? A MemberRefToken is a metadata token (tokens are four byte values with the most-significant-byte being a metadata table identifier and the remaining three bytes are the row number, 1-based) that references a row in the MemberRef metadata table. This table stores a reference to a method or field. Before generics, this is the table that would store information about methods/fields you're using from types defined in referenced assemblies. However, it can also be used to reference a member on a generic instantiation. So let's say that MemberRefToken1 refers to the first row in the MemberRef table. It might contain this data: class = TypeSpecToken1, name = ".ctor", blob = .

TypeSpecToken1 would refer to the first row in the TypeSpec table. From above we know this table stores the instantiations of generic types. In this case, this row would contain a reference to a signature blob for "Foo". So this MemberRefToken1 is really saying we are referencing "Foo.ctor()".

MemberRefToken1 and MemberRefToken2 would share the same class value, i.e. TypeSpecToken1. They would differ, however, on the name and signature blob (MethodRefToken2 would be for "set_SomeProperty"). Likewise, MemberRefToken3 and MemberRefToken4 would share TypeSpecToken2, the instantiation of "Foo", but differ on the name and blob in the same way.

When the JIT compiler compiles the above CIL, it notices that it's seeing a generic instantiation it hasn't seen before (i.e. Foo or Foo). What happens next is covered pretty well by Shiv Kumar's answer, so I won't repeat it in detail here. Simply put, when the JIT compiler encounters a new instantiated generic type, it may emit a whole new type into its type system with a field layout using the actual types in the instantiation in place of the generic parameters. They would also have their own method tables and JIT compilation of each method would involve replacing references to the generic parameters with the actual types from the instantiation. It's also the responsibility of the JIT compiler to enforce correctness and verifiability of the CIL.

So to sum up: C# compiler emits metadata describing what's generic and how generic types/methods are instantiated. The JIT compiler uses this information to emit new types (assuming it isn't compatible with an existing instantiation) at runtime for instantiated generic types and each type will have its own copy of the code that has been JIT compiled based on the actual types used in the instantiation.

Hopefully this made sense in some small way.

What is the Implementation of Generics for the NET Common Language Runtime

Answers (2)

Related Questions