Reputation: 876
I need to verify if a string contains "error" or "exception" in it, excluding certain keywords: "exception1", "exception2", "includeException", "error1".
This regex seems to do the job:
\b\w*(?!exception1)(?!exception2)(?!includeException)(?!error1)(exception|error)\w*\b
It correctly returns 2 matches when run against the following string:
Test string: "exception1 exception2 exception3 includeException error1 error2"
Matches: "exception3", "error2"
However, if I set the RegexOptions.IgnoreCase
flag or add "(?i)
" at the beginning of the Regex it also returns a match for "includeException
".
What am I missing here?
Upvotes: 1
Views: 98
Reputation: 1164
Regex is not very readable... how about a pure C# solution?
public static Boolean ContainsErrorOrExceptionExcept(this string input, string[] excludedKeywords)
{
if (input.Contains("error") || input.Contains("exception"))
{
foreach (string x in excludedKeywords)
{
if (input.Contains(x))
{
return false;
}
}
return true;
}
else
{
return false;
}
}
Upvotes: 2
Reputation: 627100
I see two main bottlenecks with your regex:
\w*
subpatterns are placed on both sides of lookaheads, thus, removing any impact from the lookaheads.The problem with case-insensitivity is described in Berin's answer, you want to match the word exception
and includeException
contains that substring. So, a possible solution is to add a leading word boundary to (error|exception)
pattern:
\b\w*(?!exception1)(?!exception2)(?!includeException)(?!error1)\b(exception|error)\w*\b
^^
However, if you need to match words containing error
or exception
, that ARE NOT EQUAL to specific keywords, use
\b(?!(?:exception1|exception2|includeException|error1)\b)\w*(exception|error)\w*\b
Here, the lookaheads are anchored to the leading word boundary, they are only checked once after each word boundary, not at each position inside a word. Certainly, you can contract it further: \b(?!(?:exception[12]|includeException|error1)\b)\w*(exception|error)\w*\b
.
Now, if you need to match words containing error
or exception
, that DO NOT CONTAIN specific keywords, use
\b(?!\w*(?:exception1|exception2|includeException|error1))\w*(exception|error)\w*\b
All regex patterns used here are tested at regexhero.net
Upvotes: 2
Reputation: 11463
Using a good Regex tester can help you figure out what's actually being matched. I used this one:
In the results where it highlights the matches, there is a small button with an 'i' for information. So the reason that it's matching innerException
when it's case insensitive is because you are matching the latter half of the word. The regex doesn't require white space separating the words.
Your regex would match with case invariant off if innerException
were written as innerexception
because your positive match (exception|error)
is matching the last half. You can also see that when you start removing spaces. exception1exception2
doesn't match, but exception1exception2exception3
does.
While Regex is very compact, there are several ways to get it wrong. A straightforward approach might be a better solution in this case.
Changing your regex to remove the last wildcard *
characters will make what you have work the way you want:
\b\w*(?!exception1)(?!exception2)(?!includeException)(?!error1)(exception|error)\w\b
Upvotes: 3