Reputation: 3870
I have to parse a text where with is a key word if it is not surrounded by square brackets. I have to match the keyword with. Also, there must be word boundaries on both side of with.
Here are some examples where with is NOT a keyword:
Here are some examples where with IS keyword
Anyone to help? Thanks in advance.
Upvotes: 12
Views: 11971
Reputation: 336088
You can look for the word with
and see that the closest bracket to its left side is not an opening bracket, and that the closest bracket to its right side is not a closing bracket:
Regex regexObj = new Regex(
@"(?<! # Assert that we can't match this before the current position:
\[ # An opening bracket
[^[\]]* # followed by any other characters except brackets.
) # End of lookbehind.
\bwith\b # Match ""with"".
(?! # Assert that we can't match this after the current position:
[^[\]]* # Any text except brackets
\] # followed by a closing bracket.
) # End of lookahead.",
RegexOptions.IgnorePatternWhitespace);
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
// matched text: matchResults.Value
// match start: matchResults.Index
// match length: matchResults.Length
matchResults = matchResults.NextMatch();
}
The lookaround expressions don't stop at line breaks; if you want each line to be evaluated separately, use [^[\]\r\n]*
instead of [^[\]]*
.
Upvotes: 19
Reputation: 75222
I think the simplest solution is to preemptively match balanced pairs of brackets and their contents to get them out of the way as you search for the keyword. Here's an example:
string s =
@"[with0]
[ with0 ]
[sometext with0 sometext]
[sometext with0]
[with0 sometext]
with1
] with1
hello with1
hello with1 world
hello [ world] with1 hello
hello [ world] with1 hello [world]";
Regex r = new Regex(@"\[[^][]*\]|(?<KEYWORD>\bwith\d\b)");
foreach (Match m in r.Matches(s))
{
if (m.Groups["KEYWORD"].Success)
{
Console.WriteLine(m.Value);
}
}
Upvotes: 1
Reputation: 28698
Nice question. I think it'll be easier to find the matches where your [with]
pattern applies, and then inverse the result.
You need to match [
, not followed by ]
, followed by with
(and then the corresponding pattern for closed square bracket)
Matching the [
and the with
are easy.
\[with
add a lookahead to exclude ]
, and also allow any number of other characters (.*
)
\[(?!]).*with
then the corresponding closed square bracket, i.e. the reverse with a lookbehind.
\[(?!]).*with.*\](?<1[)
a bit more tweaking
\[(?!(.*\].*with)).*with.*\](?<!(with.*\[.*))
and now if you inverse this, you should have your desired result. (i.e. when this returns 'true', your pattern matches and want to exclude those results).
Upvotes: 3
Reputation: 480
You'll want to look into both negative look-behinds and negative look-aheads, this will help you match your data without consuming the brackets.
Upvotes: 0