Reputation: 41500
I need a regex that is to be used for text substitution. Example: text to be matched is ABC
(which could be surrounded by square brackets), substitution text is DEF
. This is basic enough. The complication is that I don't want to match the ABC
text when it is preceded by the pattern \[[\d ]+\]\.
- in other words, when it is preceded by a word or set of words in brackets, followed by a period.
Here are some examples of source text to be matched, and the result, after the regex substitution would be made:
1. [xxx xxx].[ABC] > [xxx xxx].[ABC] (does not match - first part fits the pattern)
2. [xxx xxx].ABC > [xxx xxx].ABC (does not match - first part fits the pattern)
3. [xxx.ABC > [xxx.DEF (matches - first part has no closing bracket)
4. [ABC] > [DEF] (matches - no first part)
5. ABC > DEF (matches - no first part)
6. [xxx][ABC] > [xxx][DEF] (matches - no period in between)
7. [xxx]. [ABC] > [xxx] [DEF] (matches - space in between)
What it comes down to is: how can I specify the preceding pattern that when present as described will prevent a match? What would the pattern be in this case? (C# flavor of regex)
Upvotes: 8
Views: 6331
Reputation: 36143
You want a negative look-behind expression. These look like (?<!pattern)
, so:
(?<!\[[\d ]+\]\.)\[?ABC\]?
Note that this does not force a matching pair of square brackets around ABC; it just allows for an optional open bracket before and an optional close bracket after. If you wanted to force a matching pair or none, you'd have to use alternation:
(?<!\[[\d ]+\]\.)(?:ABC|\[ABC\])
This uses non-capturing parentheses to delimit the alternation. If you want to actually capture ABC, you can of turn that into a capture group.
ETA: The reason the first expression seems to fail is that it is matching on ABC]
, which is not preceded by the prohibited text. The open bracket [
is optional, so it just doesn't match that. The way around this is to shift the optional open bracket [
into the negative look-behind assertion, like so:
(?<!\[[\d ]+\]\.\[?)ABC\]?
An example of what it matches and doesn't:
[123].[ABC]: fail (expected: fail)
[123 456].[ABC]: fail (expected: fail)
[123.ABC: match (expected: match)
matched: ABC
ABC: match (expected: match)
matched: ABC
[ABC]: match (expected: match)
matched: ABC]
[ABC[: match (expected: fail)
matched: ABC
Trying to make the presence of an open bracket [
force a matching close bracket ]
, as the second pattern intended, is trickier, but this seems to work:
(?:(?<!\[[\d ]+\]\.\[)ABC\]|(?<!\[[\d ]+\]\.)(?<!\[)ABC(?!\]))
An example of what it matches and doesn't:
[123].[ABC]: fail (expected: fail)
[123 456].[ABC]: fail (expected: fail)
[123.ABC: match (expected: match)
matched: ABC
ABC: match (expected: match)
matched: ABC
[ABC]: match (expected: match)
matched: ABC]
[ABC[: fail (expected: fail)
The examples were generated using this code:
// Compile and run with: mcs so_regex.cs && mono so_regex.exe
using System;
using System.Text.RegularExpressions;
public class SORegex {
public static void Main() {
string[] values = {"[123].[ABC]", "[123 456].[ABC]", "[123.ABC", "ABC", "[ABC]", "[ABC["};
string[] expected = {"fail", "fail", "match", "match", "match", "fail"};
string pattern = @"(?<!\[[\d ]+\]\.\[?)ABC\]?"; // Don't force [ to match ].
//string pattern = @"(?:(?<!\[[\d ]+\]\.\[)ABC\]|(?<!\[[\d ]+\]\.)(?<!\[)ABC(?!\]))"; // Force balanced brackets.
Console.WriteLine("pattern: {0}", pattern);
int i = 0;
foreach (string text in values) {
Match m = Regex.Match(text, pattern);
bool isMatch = m.Success;
Console.WriteLine("{0}: {1} (expected: {2})", text, isMatch? "match" : "fail", expected[i++]);
if (isMatch) Console.WriteLine("\tmatched: {0}", m.Value);
}
}
}
Upvotes: 19