Reputation: 26792
I'm experimenting with roslyn, parsing and generating c# code. I'm trying to figure out how the CSharpSyntaxTree.ParseText
method handles preprocessor symbols.
Here is my test method. It takes in some C# code as a string, extracts the using
statements and returns a new string with those using
statements, taking into account preprocessor directives.
private static string Process(string input, string[] preprocessorSymbols)
{
var options = CSharpParseOptions.Default.WithPreprocessorSymbols(preprocessorSymbols);
var syntaxTree = CSharpSyntaxTree.ParseText(input, options);
var compilationUnit = (CompilationUnitSyntax)syntaxTree.GetRoot();
var usings = compilationUnit.Usings.ToArray();
var cs = SyntaxFactory.CompilationUnit()
.AddUsings(usings)
.NormalizeWhitespace();
var result = cs.ToString();
return result;
}
When feeding this method with the following input, it works as expected:
var input = "using MyUsing1;\r\nusing MyUsing2;";
string result = Process(input, new[] { "" });
Assert.AreEqual("using MyUsing1;\r\nusing MyUsing2;", result);
When adding a preprocessor directive, but not passing said directive to the parser, the result is still as expected (conditional using
statement is stripped):
var input =
"using MyUsing1;\r\n" +
"#if CONDITIONAL\r\n" +
"using MyUsing2;\r\n" +
"#endif";
string result = Process(input, new[] { "" });
Assert.AreEqual("using MyUsing1;", result);
However, when adding the CONDITIONAL
preprocessor directive to the CSharpParseOptions
, I get a strange result
var input =
"using MyUsing1;\r\n" +
"#if CONDITIONAL\r\n" +
"using MyUsing2;\r\n" +
"#endif";
string result = Process(input, new[] { "CONDITIONAL" });
Assert.AreEqual("using MyUsing1;\r\nusing MyUsing2;", result); // fails??
The actual returnvalue is "using MyUsing1;\r\n#if CONDITIONAL\r\nusing MyUsing2;"
. The #if CONDITIONAL
part is retained, and #endif
is removed.
Is this a bug, or am I doing something wrong?
Upvotes: 0
Views: 841
Reputation: 45819
In trying to understand this behavior, I added another test case to consider:
var input =
"using MyUsing1;\r\n" +
"#if CONDITIONAL\r\n" +
"using MyUsing2;\r\n" +
"#endif" +
"using MyUsing3;\r\n";
string result = Process(input, new[] { "CONDITIONAL" });
And in this case, both the #if
and the #endif
are preserved.
If you break in the debugger and look at the usings
array, it appears that each UsingDirectiveSyntax
knows both the minimal range of characters for the using
statement (Span
) and a "wider" range of characters from the original stream (FullSpan
) which includes things like, in this case, the #if
directive.
Digging a little deeper, the docs refer to preceding code like the preproc directive as "leading trivia", and it is attached to the using node as a child.
Interestingly, if you pass .AddUsings()
just one of the using directives, it seems to omit the leading trivia; but if you give it an array of multiple UsingDirectiveSyntax
s, then for each except the first, it includes the leading trivia. (That's probably not exactly right; I'm working from black-box observations only.)
I'm not going to pretend to understand the reasoning for that behavior. The upshot is that many bits of code that look ok - like your example - will produce troubling output. (If you pass in new[] {usings[0], usings[2], usings[1]}
you get even worse-looking output, with the #endif
before the #if
. But... you know... I guess why would you do that?)
So if you want to use these tools to generate source code to be fed back into a full build pipeline, you could see this as a bug (or at least, a weird behavior that could easily be a source of bugs). If there's intended usage that would keep you clear of this, I can't find straightforward documentation of it. In this case, you could remove the trivia from the usings
before adding them to the output; but in other cases, that might drop something you want to preserve I would think.
Upvotes: 1