Reputation: 57899
What I'm trying to do: remove innermost unescaped square brackets surrounding a specific, unescaped character (\
is escape)
input: [\[x\]]\]\[[\[y\]]
output when looking for brackets around y: [\[x\]]\]\[\[y\]
output when looking for brackets around x: \[x\]\]\[[\[y\]]
In short, remove only the unescaped set of brackets around the specific character.
I tried this (for y): Regex.Replace(input, @"(?<!\\)\[(.*?(?<!\\)y.*?)(?<!\\)\]",@"$1"
, but that seems to match the first unescaped [
(before the x) with the last ]
. I figured I could replace the .
wildcards with a negating character class to exclude [
and ]
, but what I really need to negate is unescaped versions of these, and when I try to incorporate a negative lookbehind like (?<!\\)
in the negating character class, I seem to match nothing at all.
Thanks in advance for your time and effort.
To clarify, the contents of the unescaped square brackets can be anything (except another unescaped square bracket), as long as they contain the unescaped character of interest (y
). All the content of the brackets should remain.
Upvotes: 1
Views: 1740
Reputation: 10579
Edited after question was edited
Regex.Replace(input, @"((?<!\\)\[(?=((\\\[)|[^[])*((?<!\\)y)))|((?<=[^\\]y((\\\]|[^]]))*)(?<!\\)\])","");
We want to match the brackets to be removed:
(?<!\\)\[ - Match is an unescaped left bracket
(?=((\\\[)|[^[])*((?<!\\)y)) - Match is followed by any number of (escaped left brackets or non-left brackets) followed by an unescaped y
| - OR
(?<=[^\\]y((\\\]|[^]]))*) - Match is preceded by unescaped y followed by any number of (escaped right brackets or non-right brackets)
(?<!\\)\] - Match is an unescaped right bracket
Upvotes: 1
Reputation: 75222
Lookbehind is the wrong tool for this job. Try this instead:
Regex r = new Regex(
@"\[((?>(?:[^y\[\]\\]|\\.)*)y(?>(?:[^\[\]\\]|\\.)*))\]");
string s1 = @"[\[x\]]\]\[[\[y\]]";
Console.WriteLine(s1);
Console.WriteLine(r.Replace(s1, @"%$1%"));
Console.WriteLine();
string s2 = @"[\[x\]]\]\[[1234(\[abcycba\]\y\y)]";
Console.WriteLine(s2);
Console.WriteLine(r.Replace(s2, @"%$1%"));
result:
[\[x\]]\]\[[\[y\]]
[\[x\]]\]\[%\[y\]%
[\[x\]]\]\[[1234(\[abcycba\]\y\y)]
[\[x\]]\]\[%1234(\[abcycba\]\y\y)%
(I replaced the brackets with %
instead of deleting them to make it easier to see exactly what's getting replaced.)
(?:\\.|[^y\[\]\\])*
matches zero or more of (1) a backslash followed by any character, or (2) anything that's not a 'y', a square bracket or a backslash. If the next character is a 'y', it gets consumed and (?:\\.|[^\[\]\\])*
matches any remaining characters until the next unescaped bracket. Including both brackets in the negated character class (along with the backslash) ensures that you only match the innermost set of unescaped brackets.
It's also vital that you use the atomic groups--i.e., (?>...)
; this prevents backtracking which we know would be useless, and which could cause serious performance problems when the regex is used on strings that contain no matches.
An alternative approach would use a lookahead to assert the presence of the 'y' and then use the much simpler (?>(?:\\.|[^\[\]\\])*)
to consume the characters between the brackets. The problem is that you're now making two passes over the string, and it can be tricky making sure the lookahead doesn't look too far ahead, or not far enough. Doing all the work in one pass makes it much easier to keep track of where you are at each stage of the matching process.
Upvotes: 2
Reputation: 5445
Writing a regex for this might be overly complex for the problem. While this function is a bit lengthy, it's conceptually simple and does the trick:
string FixString(char x, string original)
{
int i = 0;
string s = original;
while (i < s.Length)
{
if (s[i] == x)
{
bool found = false;
for (int j = i + 1; (j < s.Length) && !found; j++)
{
if ((s[j] == ']') &&
(s[j-1] != '\\'))
{
s = s.Remove(j, 1);
found = true;
}
}
if (i > 0)
{
found = false;
for (int j = i - 1; (j >= 0) && !found; j--)
{
if ((s[j] == '[') &&
( (j == 0) ||
(s[j - 1] != '\\') ))
{
s = s.Remove(j, 1);
i--;
found = true;
}
}
}
}
i++;
}
return s;
}
Upvotes: 2