Reputation: 12526
When you use generic collections in C# (or .NET in general), does the compiler basically do the leg-work developers used to have to do of making a generic collection for a specific type. So basically . . . it just saves us work?
Now that I think about it, that can't be right. Because without generics, we used to have to make collections that used a non-generic array internally, and so there was boxing and unboxing (if it was a collection of value types), etc.
So, how are generics rendered in CIL? What is it doing to impliment when we say we want a generic collection of something? I don't necessarily want CIL code examples (though that would be ok), I want to know the concepts of how the compiler takes our generic collections and renders them.
Thanks!
P.S. I know that I could use ildasm to look at this but I CIL still looks like chinese to me, and I am not ready to tackle that. I just want the concepts of how C# (and other languages I guess too) render in CIL to handle generics.
Upvotes: 13
Views: 3965
Reputation:
Forgive my verbose post, but this topic is quite broad. I'm going to attempt to describe what the C# compiler emits and how that's interpreted by the JIT compiler at runtime.
ECMA-335 (it's a really well written design document; check it out) is where it's at for knowing how everything, and I mean everything, is represented in a .NET assembly. There are a few related CLI metadata tables for generic information in an assembly:
So with this in mind, let's walk through a simple example using this class:
class Foo<T>
{
public T SomeProperty { get; set; }
}
When the C# compiler compiles this example, it will define Foo in the TypeDef metadata table, like it would for any other type. Unlike a non-generic type, it will also have an entry in the GenericParam table that will describe its generic parameter (index = 0, flags = ?, name = (index into String heap, "T"), owner = type "Foo").
One of the columns of data in the TypeDef table is the starting index into the MethodDef table that is the continuous list of methods defined on this type. For Foo, we've defined three methods: a getter and a setter to SomeProperty and a default constructor supplied by the compiler. As a result, the MethodDef table would hold a row for each of these methods. One of the important columns in the MethodDef table is the "Signature" column. This column stores a reference to a blob of bytes that describes the exact signature of the method. ECMA-335 goes into great detail about these metadata signature blobs, so I won't regurgitate that information here.
The method signature blob contains type information about the parameters as well as the return value. In our example, the setter takes a T and the getter returns a T. Well, what is a T then? In the signature blob, it's going to be a special value that means "the generic type parameter at index 0". This means the row in the GenericParams table that has index=0 with owner=type "Foo", which is our "T".
The same thing goes for the auto-property backing store field. Foo's entry in the TypeDef table will have a starting index into the Field table and the Field table has a "Signature" column. The field's signature will denote that the field's type is "the generic type parameter at index 0".
This is all well and good, but where does the code generation come into play when T is different types? It's actually the responsibility of the JIT compiler to generate the code for the generic instantiations and not the C# compiler.
Let's take a look at an example:
Foo<int> f1 = new Foo<int>();
f1.SomeProperty = 10;
Foo<string> f2 = new Foo<string>();
f2.SomeProperty = "hello";
This will compile to something like this CIL:
newobj <MemberRefToken1> // new Foo<int>()
stloc.0 // Store in local "f1"
ldloc.0 // Load local "f1"
ldc.i4.s 10 // Load a constant 32-bit integer with value 10
callvirt <MemberRefToken2> // Call f1.set_SomeProperty(10)
newobj <MemberRefToken3> // new Foo<string>()
stloc.1 // Store in local "f2"
ldloc.1 // Load local "f2"
ldstr <StringToken> // Load "hello" (which is in the user string heap)
callvirt <MemberRefToken4> // Call f2.set_SomeProperty("hello")
So what's this MemberRefToken business? A MemberRefToken is a metadata token (tokens are four byte values with the most-significant-byte being a metadata table identifier and the remaining three bytes are the row number, 1-based) that references a row in the MemberRef metadata table. This table stores a reference to a method or field. Before generics, this is the table that would store information about methods/fields you're using from types defined in referenced assemblies. However, it can also be used to reference a member on a generic instantiation. So let's say that MemberRefToken1 refers to the first row in the MemberRef table. It might contain this data: class = TypeSpecToken1, name = ".ctor", blob = <reference to expected signature blob of .ctor>.
TypeSpecToken1 would refer to the first row in the TypeSpec table. From above we know this table stores the instantiations of generic types. In this case, this row would contain a reference to a signature blob for "Foo<int>". So this MemberRefToken1 is really saying we are referencing "Foo<int>.ctor()".
MemberRefToken1 and MemberRefToken2 would share the same class value, i.e. TypeSpecToken1. They would differ, however, on the name and signature blob (MethodRefToken2 would be for "set_SomeProperty"). Likewise, MemberRefToken3 and MemberRefToken4 would share TypeSpecToken2, the instantiation of "Foo<string>", but differ on the name and blob in the same way.
When the JIT compiler compiles the above CIL, it notices that it's seeing a generic instantiation it hasn't seen before (i.e. Foo<int> or Foo<string>). What happens next is covered pretty well by Shiv Kumar's answer, so I won't repeat it in detail here. Simply put, when the JIT compiler encounters a new instantiated generic type, it may emit a whole new type into its type system with a field layout using the actual types in the instantiation in place of the generic parameters. They would also have their own method tables and JIT compilation of each method would involve replacing references to the generic parameters with the actual types from the instantiation. It's also the responsibility of the JIT compiler to enforce correctness and verifiability of the CIL.
So to sum up: C# compiler emits metadata describing what's generic and how generic types/methods are instantiated. The JIT compiler uses this information to emit new types (assuming it isn't compatible with an existing instantiation) at runtime for instantiated generic types and each type will have its own copy of the code that has been JIT compiled based on the actual types used in the instantiation.
Hopefully this made sense in some small way.
Upvotes: 19
Reputation: 9799
For value types, there is a specific "class" defined at run time for each value type generic class. For reference types there is only one class definition that is reused across the different types.
I'm simplifying here, but that's the concept.
Design and Implementation of Generics for the NET Common Language Runtime
Our scheme runs roughly as follows: When the runtime requires a particular instantiation of a parameterized class, the loader checks to see if the instantiation is compatible with any that it has seen before; if not, then a field layout is determined and new vtable is created, to be shared between all compatible instantiations. The items in this vtable are entry stubs for the methods of the class. When these stubs are later invoked, they will generate (“just- in-time”) code to be shared for all compatible instantiations. When compiling the invocation of a (non-virtual) polymorphic method at a particular instantiation, we first check to see
if we have compiled such a call before for some compatible instantiation; if not, then an entry stub is generated, which will in turn generate code to be shared for all compatible instantiations. Two instantiations are compatible if for any parameterized class its compilation at these instantiations gives rise to identical code and other execution structures (e.g. field layout and GC tables), apart from the dictionaries described below in Section 4.4. In particular, all reference types are compatible with each other, because the loader and JIT compiler make no distinction for the purposes of field layout or code generation. On the implementation for the Intel x86, at least, primitive types are mutually incompatible, even if they have the same size (floats and ints have different parameter passing conventions). That leaves user-defined struct types, which are compatible if their layout is the same with respect to garbage collection i.e. they share the same pattern of traced pointers.
http://research.microsoft.com/pubs/64031/designandimplementationofgenerics.pdf
Upvotes: 10