Reputation: 4845
The question is about C# language specification and CIL language specification, as well as Microsoft's and Mono's C# compiler behavior.
I'm building some code analysis tools (no matter what), which operate on CIL.
Considering a few code samples, I notice that code statements (try/catch, ifelse, ifthen, loops, ...) generate connected blocks of MSIL.
But I'd like to be sure that I can't write C# code construct which yields non-connected MSIL. More specifically, can I write any C# statement which translates to (something similar to):
IL_0000:
IL_0001:
IL_0002:
// hole
IL_001a:
IL_001b:
I already tried some weird stuff using goto
and nested loops, but maybe I'm not as mad as some users would be.
Upvotes: 7
Views: 436
Reputation: 659956
Sure, that's trivially possible. Something like:
static void M(bool x)
{
if (x)
return;
else
M(x);
return;
}
If you compile that in debug mode you get
IL_0000: nop
IL_0001: ldarg.0
IL_0002: stloc.0
IL_0003: ldloc.0
IL_0004: brfalse.s IL_0008
IL_0006: br.s IL_0011
IL_0008: ldarg.0
IL_0009: call void A::M(bool)
IL_000e: nop
IL_000f: br.s IL_0011
IL_0011: ret
The if
statement goes from 0001
to 0009
, and the consequence of the if
is a goto to 0011
; both return
statements are the same code, so there is a "hole" containing a nop
and an unconditional branch between the main body of the if
and the consequence.
More generally, you should never assume anything whatsoever about the layout of the IL produced by the C# compiler. The compiler makes no guarantees whatsoever other than that the IL produced will be legal and, if safe, verifiable.
You say you are writing some code analysis tools; as the author of significant portions of the C# analyzer, and someone who worked on third-party analysis tools at Coverity, a word of advice: for the majority of questions you typically want answered about C# programs, the parse tree produced by Roslyn is the entity you wish to analyze, not the IL. The parse tree is a concrete syntax tree; it is one-to-one with every character in the source code. It can be very difficult to map optimized IL back to the original source code, and it can be very easy to produce false positives in an IL analysis.
Put another way: source-to-IL is semantics-preserving but also information-losing; you typically want to analyze the artifact that has the most information in it.
If you must, for whatever reason, operate your analyzer at the IL level, your first task should probably be to find the boundaries of the basic blocks, particularly if you are analyzing reachability properties.
A "basic block" is a contiguous chunk of IL where the end point of the block does not "carry on" to the following instruction -- because it is a branch, return or throw, for instance -- and there are no branches into the block to anywhere except the first instruction.
You can then form a graph of basic blocks for each method, indicating which ones can possible transfer control to which other blocks. This "raises the level" of your analysis; instead of analyzing the effects of a sequence of IL instructions, now you're analyzing the effects of a graph of basic blocks.
If you say more about what sorts of analysis you're doing I can advise further.
Upvotes: 13
Reputation: 6214
In theory yes (this comes from my experience) . Your analysis tool does not deal with c# directly, but works on IL code only. IL can be produced by anybody, not only by visual studio, but also by other language compilers like visual basic, python. Net... and obfuscators! Obfuscators are the real culprit:while other compilers try to adhere to the specs, obfuscators do their best to exploit the specs and the target runtime.
Obfuscated code might violate certain common sense patterns. Consider this case: certain smart obfuscators produce illegal msil, but the jitter digest it because it happens that the invalid portions are in the end not executed.
When building an analysis tool, you can't handle these cases unless your target is to build a deobfuscator.
Upvotes: 1