Don_B
Don_B

Reputation: 243

Replacing opening and closing parenthesis of a certain strucure?

I'm trying to replace the parenthesis inside a certain tag to just outside of the tag i.e. if there is a opening parenthesis immediately after the tag or a closing parenthesis immediately before the closing tag. Example:

<italic>(When a parenthetical sentence stands on its own)</italic>
<italic>(When a parenthetical sentence stands on its own</italic>
<italic>When a parenthetical sentence stands on its own)</italic>

Those lines should be after replace:

(<italic>When a parenthetical sentence stands on its own</italic>)
(<italic>When a parenthetical sentence stands on its own</italic>
<italic>When a parenthetical sentence stands on its own</italic>)

However, strings like the the next three below should stay untouched.

<italic>(When) a parenthetical sentence stands on its own</italic>
<italic>When a parenthetical sentence stands on its (own)</italic>
<italic>When a parenthetical sentence stands (on) its own</italic>

But the following strings:

<italic>((When) a parenthetical sentence stands on its own</italic>
<italic>((When) a parenthetical sentence stands on its own)</italic>
<italic>(When) a parenthetical sentence stands on its own)</italic>
<italic>When a parenthetical sentence stands on its (own))</italic>
<italic>(When a parenthetical sentence stands on its (own)</italic>

should be after the replace(s):

(<italic>(When) a parenthetical sentence stands on its own</italic>
(<italic>(When) a parenthetical sentence stands on its own</italic>)
<italic>(When) a parenthetical sentence stands on its own</italic>)
<italic>When a parenthetical sentence stands on its (own)</italic>)
(<italic>When a parenthetical sentence stands on its (own)</italic>

There could be nested tags inside the <italic>...</italic> tags and a line can contain multiple <italic>...</italic> strings. Also if there is a nested tag <inline-formula>...</inline-formula> inside <italic>...</italic> then those should be ignored.

Can I do this using regex? If not what other way can I do this?

My approach is this (I am still not sure if it covers all possible cases):

1st step: <italic>( ---> (<italic> find <italic>( if the tag is not followed by a matching pair of parenthesis immediately not followed by a closing tag The match is allowed only within a single line.

Find what: (<(italic)>)(?!(\((?>(?:(?![()\r\n]).)++|(?3))*+\))(?!</$2\b))(\() Replace with: $4$1

2nd step: )</italic> ---> </italic>) find )</italic> if the tag is not preceded by a matching pair of parenthesis immediately not preceded by an opening tag The match is allowed only within a single line.

(\))(?<!(?<!<(italic)>)(\((?>(?:(?![()\r\n]).)++|(?3))*+\)))(</2\b>)

Upvotes: 1

Views: 83

Answers (1)

Ameer
Ameer

Reputation: 2638

You could do this a few different ways, I would start by defining when a tag is replaceable.

  1. We can replace the opening tag if the text in the tag starts with ( and is either closed right before the closing tag, or is unclosed
  2. We can replace the closing tag if the text in the tag ends with ) and it was opened right after the opening tag, or it was unopened

This problem seems like it lends itself to a parser approach and keeping track of the parenthesis state (was there a parenthesis at the beginning of the tag text, and how nested are the parenthesis at the current point). Writing a parser would let us do the replacement in a constructive manner as opposed to searching with a regex, and replacing substrings and would be naturally recursive which would handle nesting. Doing this with a regex seems a bit convoluted. Here's what I came up with.

using System;
using System.IO;
using System.Text;

namespace ParenParser {
    public class Program
    {
        public static Stream GenerateStreamFromString(string s)
        {
            MemoryStream stream = new MemoryStream();
            StreamWriter writer = new StreamWriter(stream);
            writer.Write(s);
            writer.Flush();
            stream.Position = 0;
            return stream;
        }

        public static String Process(StreamReader s) { // root
            StringBuilder output = new StringBuilder();
            while (!s.EndOfStream) {
                var ch = Convert.ToChar(s.Read());
                if (ch == '<') {
                    output.Append(ProcessTag(s, true));
                } else {
                    output.Append(ch);
                }
            }

            return output.ToString();
        }

        public static String ProcessTag(StreamReader s, bool skipOpeningBracket = true) {
            int currentParenDepth = 0;
            StringBuilder openingTag = new StringBuilder(), allTagText = new StringBuilder(), closingTag = new StringBuilder();
            bool inOpeningTag = false, inClosingTag = false;
            if (skipOpeningBracket) {
                inOpeningTag = true;
                openingTag.Append('<');
                skipOpeningBracket = false;
            }

            while (!s.EndOfStream) {
                var ch = Convert.ToChar(s.Read());
                if (ch == '<') { // start of a tag
                    var nextCh = Convert.ToChar(s.Peek());
                    if (nextCh == '/') { // closing tag!
                        closingTag.Append(ch);
                        inClosingTag = true;
                    } else if (openingTag.ToString().Length != 0) { // already seen a tag, recurse
                        allTagText.Append(ProcessTag(s, true));
                        continue;
                    } else {
                        openingTag.Append(ch);
                        inOpeningTag = true;
                    }
                }
                else if (inOpeningTag) {
                    openingTag.Append(ch);
                    if (ch == '>') {
                        inOpeningTag = false;
                    }
                }
                else if (inClosingTag) {
                    closingTag.Append(ch);
                    if (ch == '>') {
                        // Done!
                        var allTagTextString = allTagText.ToString();
                        if (allTagTextString.Length > 0 && allTagTextString[0] == '(' && allTagTextString[allTagTextString.Length - 1] == ')' && currentParenDepth == 0) {
                            return "(" + openingTag.ToString() + allTagTextString.Substring(1, allTagTextString.Length - 2) + closingTag.ToString() + ")";
                        } else if (allTagTextString.Length > 0 && allTagTextString[0] == '(' && currentParenDepth > 0) { // unclosed
                            return "(" + openingTag.ToString() + allTagTextString.Substring(1, allTagTextString.Length - 1) + closingTag.ToString();
                        } else if (allTagTextString.Length > 0 && allTagTextString[allTagTextString.Length - 1] == ')' && currentParenDepth < 0) { // unopened
                            return openingTag.ToString() + allTagTextString.Substring(0, allTagTextString.Length - 1) + closingTag.ToString() + ")";
                        } else {
                            return openingTag.ToString() + allTagTextString + closingTag.ToString();
                        }
                    }
                }
                else
                {
                    allTagText.Append(ch);
                    if (ch == '(') {
                        currentParenDepth++;
                    }
                    else if (ch == ')') {
                        currentParenDepth--;
                    }
                }
            }

            return openingTag.ToString() + allTagText.ToString() + closingTag.ToString();
        }

        public static void Main()
        {
            var testCases = new String[] {
                // Should change
                "<italic>(When a parenthetical sentence stands on its own)</italic>",
                "<italic>(When a parenthetical sentence stands on its own</italic>",
                "<italic>When a parenthetical sentence stands on its own)</italic>",

                // Should remain unchanged
                "<italic>(When) a parenthetical sentence stands on its own</italic>",
                "<italic>When a parenthetical sentence stands on its (own)</italic>",
                "<italic>When a parenthetical sentence stands (on) its own</italic>",

                // Should be changed
                "<italic>((When) a parenthetical sentence stands on its own</italic>",
                "<italic>((When) a parenthetical sentence stands on its own)</italic>",
                "<italic>(When) a parenthetical sentence stands on its own)</italic>",
                "<italic>When a parenthetical sentence stands on its (own))</italic>",
                "<italic>(When a parenthetical sentence stands on its (own)</italic>",

                // Other cases
                "<italic>(Try This on!)</italic>",
                "<italic><italic>(Try This on!)</italic></italic>",
                "<italic></italic>",
                "",
                "()",
                "<italic>()</italic>",
                "<italic>"
            };

            foreach(var testCase in testCases) {
                using(var testCaseStreamReader = new StreamReader(GenerateStreamFromString(testCase))) {
                    Console.WriteLine(testCase + " --> " + Process(testCaseStreamReader));
                }
            }
        }
    }
}

The test case results look something like

<italic>(When a parenthetical sentence stands on its own</italic> --> (<italic>When a parenthetical sentence stands on its own</italic>
<italic>When a parenthetical sentence stands on its own)</italic> --> <italic>When a parenthetical sentence stands on its own</italic>)
<italic>(When) a parenthetical sentence stands on its own</italic> --> <italic>(When) a parenthetical sentence stands on its own</italic>
<italic>When a parenthetical sentence stands on its (own)</italic> --> <italic>When a parenthetical sentence stands on its (own)</italic>
<italic>When a parenthetical sentence stands (on) its own</italic> --> <italic>When a parenthetical sentence stands (on) its own</italic>
<italic>((When) a parenthetical sentence stands on its own</italic> --> (<italic>(When) a parenthetical sentence stands on its own</italic>
<italic>((When) a parenthetical sentence stands on its own)</italic> --> (<italic>(When) a parenthetical sentence stands on its own</italic>)
<italic>(When) a parenthetical sentence stands on its own)</italic> --> <italic>(When) a parenthetical sentence stands on its own</italic>)
<italic>When a parenthetical sentence stands on its (own))</italic> --> <italic>When a parenthetical sentence stands on its (own)</italic>)
<italic>(When a parenthetical sentence stands on its (own)</italic> --> (<italic>When a parenthetical sentence stands on its (own)</italic>
<italic>(Try This on!)</italic> --> (<italic>Try This on!</italic>)
<italic><italic>(Try This on!)</italic></italic> --> (<italic><italic>Try This on!</italic></italic>)
<italic></italic> --> <italic></italic>
 --> 
() --> ()
<italic>()</italic> --> (<italic></italic>)
<italic> --> <italic>

Upvotes: 1

Related Questions