Reputation: 23234
NOTE: I went with the regex approach. It was very fast so I stuck with that.
Given the following input
@class using System.Text.Json
@attribute using System
@attribute using System.Text
@attribute required Type theType
@attribute optional bool IsReadOnly = false
@attribute optional bool SomethingElse
I need to create the following objects
new UsingClause(Target.Class, "System.Text.Json");
new UsingClause(Target.Attribute, "System");
new UsingClause(Target.Attribute, "System.Text");
new AttributeProperty(required: true, name: "theType", type: "Type", default: null);
new AttributeProperty(required: false, name: "IsReadOnly", type: "bool", default: "true");
new AttributeProperty(required: false, name: "SomethingElse", type: "bool", default: null);
Ordinarily I'd split the strings by spaces into an array etc, but this will be used in a Roslyn code generator, so I need it to be as fast as possible.
What approach should I use for this kind of script pre-processing task?
I've started down the route of reading each line using a StringReader
and then checking it with a compiled Regex
. But this whole "ultra efficient" requirement is new to me and I don't want to risk messing it up.
private readonly static Regex Regex = new Regex(
pattern: @"^\s*((@attribute)\s+(using)\s+(.*))|((@attribute)\s+(optional|required)\s+(\w+[\w\.]*)\s+(\w+)(\s*\=\s*(.*))?)|((@class)\s+(using)\s+(.*))\s*$",
options: RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Compiled);
Upvotes: 1
Views: 256
Reputation: 3573
It kind of depends on what you consider "efficient" but you should consider writing an ANTLR grammar that would allow the text to be parsed into C# objects which you can then convert to your desired output using something like Antlr4.Runtime.Standard.
Writing an ANTLR grammar would have the benefit of finding any inconsistencies/ambiguitites in the input 'language'. It would also allow you to add/change capabilities in the future.
Upvotes: 2