Reputation: 6085
I'm building a lexical analysis engine in c#. For the most part it is done and works quite well. One of the features of my lexer is that it allows any user to input their own regular expressions. This allows the engine to lex all sort of fun and interesting things and output a tokenised file.
One of the issues im having is I want the user to have everything contained in this tokenised file. I.E the parts they are looking for and the parts they are not (Partial Highlighting would be a good example of this).
Based on the way my lexer highlights I found the best way to do this would be to negate the regular expressions given by the user.
So if the user wanted to lex a string for every occurrence of "T" the negated version would find everything except "T".
Now the above is easy to do but what if a user supplies 8 different expressions of a complex nature, is there a way to put all these expressions into one and negate the lot?
Upvotes: 0
Views: 907
Reputation: 13990
You could combine several RegEx's into 1 by using (pattern1)|(pattern1)|... To negate it you just check for !IsMatch
var matches = Regex.Matches("aa bb cc dd", @"(?<token>a{2})|(?<token>d{2})");
would return in fact 2 tokens (note that I've used the same name twice.. that's ok) Also explore Regex.Split. For instance:
var split = Regex.Split("aa bb cc dd", @"(?<token>aa bb)|(?:\s+)");
returns the words as tokens, except for "aa bb" which is returned as one token because I defined it as so with (?...).
You can also use the Index and Length properties to calculate the middle parts that have not been recognized by the Regex:
var matches = Regex.Matches("aa bb cc dd", @"(?<token>a{2})|(?<token>d{2})");
for (int i = 0; i < matches.Count; i++)
{
var group = matches[i].Groups["token"];
Console.WriteLine("Token={0}, Index={1}, Length={2}", group.Value, group.Index, group.Length);
}
Upvotes: 1