Reputation: 243
I'm trying to replace the parenthesis inside a certain tag to just outside of the tag i.e. if there is a opening parenthesis immediately after the tag or a closing parenthesis immediately before the closing tag. Example:
<italic>(When a parenthetical sentence stands on its own)</italic>
<italic>(When a parenthetical sentence stands on its own</italic>
<italic>When a parenthetical sentence stands on its own)</italic>
Those lines should be after replace:
(<italic>When a parenthetical sentence stands on its own</italic>)
(<italic>When a parenthetical sentence stands on its own</italic>
<italic>When a parenthetical sentence stands on its own</italic>)
However, strings like the the next three below should stay untouched.
<italic>(When) a parenthetical sentence stands on its own</italic>
<italic>When a parenthetical sentence stands on its (own)</italic>
<italic>When a parenthetical sentence stands (on) its own</italic>
But the following strings:
<italic>((When) a parenthetical sentence stands on its own</italic>
<italic>((When) a parenthetical sentence stands on its own)</italic>
<italic>(When) a parenthetical sentence stands on its own)</italic>
<italic>When a parenthetical sentence stands on its (own))</italic>
<italic>(When a parenthetical sentence stands on its (own)</italic>
should be after the replace(s):
(<italic>(When) a parenthetical sentence stands on its own</italic>
(<italic>(When) a parenthetical sentence stands on its own</italic>)
<italic>(When) a parenthetical sentence stands on its own</italic>)
<italic>When a parenthetical sentence stands on its (own)</italic>)
(<italic>When a parenthetical sentence stands on its (own)</italic>
There could be nested tags inside the <italic>...</italic>
tags and a line can contain multiple <italic>...</italic>
strings.
Also if there is a nested tag <inline-formula>...</inline-formula>
inside <italic>...</italic>
then those should be ignored.
Can I do this using regex? If not what other way can I do this?
My approach is this (I am still not sure if it covers all possible cases):
1st step: <italic>( ---> (<italic>
find <italic>
( if the tag is not followed by a matching pair of parenthesis immediately not followed by a closing tag
The match is allowed only within a single line.
Find what: (<(italic)>)(?!(\((?>(?:(?![()\r\n]).)++|(?3))*+\))(?!</$2\b))(\()
Replace with: $4$1
2nd step: )</italic> ---> </italic>)
find )</italic>
if the tag is not preceded by a matching pair of parenthesis immediately not preceded by an opening tag
The match is allowed only within a single line.
(\))(?<!(?<!<(italic)>)(\((?>(?:(?![()\r\n]).)++|(?3))*+\)))(</2\b>)
Upvotes: 1
Views: 83
Reputation: 2638
You could do this a few different ways, I would start by defining when a tag is replaceable.
This problem seems like it lends itself to a parser approach and keeping track of the parenthesis state (was there a parenthesis at the beginning of the tag text, and how nested are the parenthesis at the current point). Writing a parser would let us do the replacement in a constructive manner as opposed to searching with a regex, and replacing substrings and would be naturally recursive which would handle nesting. Doing this with a regex seems a bit convoluted. Here's what I came up with.
using System;
using System.IO;
using System.Text;
namespace ParenParser {
public class Program
{
public static Stream GenerateStreamFromString(string s)
{
MemoryStream stream = new MemoryStream();
StreamWriter writer = new StreamWriter(stream);
writer.Write(s);
writer.Flush();
stream.Position = 0;
return stream;
}
public static String Process(StreamReader s) { // root
StringBuilder output = new StringBuilder();
while (!s.EndOfStream) {
var ch = Convert.ToChar(s.Read());
if (ch == '<') {
output.Append(ProcessTag(s, true));
} else {
output.Append(ch);
}
}
return output.ToString();
}
public static String ProcessTag(StreamReader s, bool skipOpeningBracket = true) {
int currentParenDepth = 0;
StringBuilder openingTag = new StringBuilder(), allTagText = new StringBuilder(), closingTag = new StringBuilder();
bool inOpeningTag = false, inClosingTag = false;
if (skipOpeningBracket) {
inOpeningTag = true;
openingTag.Append('<');
skipOpeningBracket = false;
}
while (!s.EndOfStream) {
var ch = Convert.ToChar(s.Read());
if (ch == '<') { // start of a tag
var nextCh = Convert.ToChar(s.Peek());
if (nextCh == '/') { // closing tag!
closingTag.Append(ch);
inClosingTag = true;
} else if (openingTag.ToString().Length != 0) { // already seen a tag, recurse
allTagText.Append(ProcessTag(s, true));
continue;
} else {
openingTag.Append(ch);
inOpeningTag = true;
}
}
else if (inOpeningTag) {
openingTag.Append(ch);
if (ch == '>') {
inOpeningTag = false;
}
}
else if (inClosingTag) {
closingTag.Append(ch);
if (ch == '>') {
// Done!
var allTagTextString = allTagText.ToString();
if (allTagTextString.Length > 0 && allTagTextString[0] == '(' && allTagTextString[allTagTextString.Length - 1] == ')' && currentParenDepth == 0) {
return "(" + openingTag.ToString() + allTagTextString.Substring(1, allTagTextString.Length - 2) + closingTag.ToString() + ")";
} else if (allTagTextString.Length > 0 && allTagTextString[0] == '(' && currentParenDepth > 0) { // unclosed
return "(" + openingTag.ToString() + allTagTextString.Substring(1, allTagTextString.Length - 1) + closingTag.ToString();
} else if (allTagTextString.Length > 0 && allTagTextString[allTagTextString.Length - 1] == ')' && currentParenDepth < 0) { // unopened
return openingTag.ToString() + allTagTextString.Substring(0, allTagTextString.Length - 1) + closingTag.ToString() + ")";
} else {
return openingTag.ToString() + allTagTextString + closingTag.ToString();
}
}
}
else
{
allTagText.Append(ch);
if (ch == '(') {
currentParenDepth++;
}
else if (ch == ')') {
currentParenDepth--;
}
}
}
return openingTag.ToString() + allTagText.ToString() + closingTag.ToString();
}
public static void Main()
{
var testCases = new String[] {
// Should change
"<italic>(When a parenthetical sentence stands on its own)</italic>",
"<italic>(When a parenthetical sentence stands on its own</italic>",
"<italic>When a parenthetical sentence stands on its own)</italic>",
// Should remain unchanged
"<italic>(When) a parenthetical sentence stands on its own</italic>",
"<italic>When a parenthetical sentence stands on its (own)</italic>",
"<italic>When a parenthetical sentence stands (on) its own</italic>",
// Should be changed
"<italic>((When) a parenthetical sentence stands on its own</italic>",
"<italic>((When) a parenthetical sentence stands on its own)</italic>",
"<italic>(When) a parenthetical sentence stands on its own)</italic>",
"<italic>When a parenthetical sentence stands on its (own))</italic>",
"<italic>(When a parenthetical sentence stands on its (own)</italic>",
// Other cases
"<italic>(Try This on!)</italic>",
"<italic><italic>(Try This on!)</italic></italic>",
"<italic></italic>",
"",
"()",
"<italic>()</italic>",
"<italic>"
};
foreach(var testCase in testCases) {
using(var testCaseStreamReader = new StreamReader(GenerateStreamFromString(testCase))) {
Console.WriteLine(testCase + " --> " + Process(testCaseStreamReader));
}
}
}
}
}
The test case results look something like
<italic>(When a parenthetical sentence stands on its own</italic> --> (<italic>When a parenthetical sentence stands on its own</italic>
<italic>When a parenthetical sentence stands on its own)</italic> --> <italic>When a parenthetical sentence stands on its own</italic>)
<italic>(When) a parenthetical sentence stands on its own</italic> --> <italic>(When) a parenthetical sentence stands on its own</italic>
<italic>When a parenthetical sentence stands on its (own)</italic> --> <italic>When a parenthetical sentence stands on its (own)</italic>
<italic>When a parenthetical sentence stands (on) its own</italic> --> <italic>When a parenthetical sentence stands (on) its own</italic>
<italic>((When) a parenthetical sentence stands on its own</italic> --> (<italic>(When) a parenthetical sentence stands on its own</italic>
<italic>((When) a parenthetical sentence stands on its own)</italic> --> (<italic>(When) a parenthetical sentence stands on its own</italic>)
<italic>(When) a parenthetical sentence stands on its own)</italic> --> <italic>(When) a parenthetical sentence stands on its own</italic>)
<italic>When a parenthetical sentence stands on its (own))</italic> --> <italic>When a parenthetical sentence stands on its (own)</italic>)
<italic>(When a parenthetical sentence stands on its (own)</italic> --> (<italic>When a parenthetical sentence stands on its (own)</italic>
<italic>(Try This on!)</italic> --> (<italic>Try This on!</italic>)
<italic><italic>(Try This on!)</italic></italic> --> (<italic><italic>Try This on!</italic></italic>)
<italic></italic> --> <italic></italic>
-->
() --> ()
<italic>()</italic> --> (<italic></italic>)
<italic> --> <italic>
Upvotes: 1