Bikash K C
Bikash K C

Reputation: 13

Regular expression to select the content inside a tag if it has the nested tags of the same type

I wanted to remove the nested tags formed after various DOM manipulations from the html string so that the end html string after the manipulation looks clean but reflects the correct behaviour. I am using the regular expression to select the content inside the main tag and replace the nested tags with "". The problem is I could not get the regular expression to select the content inside the main tag.

Example html string: <em>a<em>bbb<strong>ccc<em>ddddd</em></strong>eeee</em></em><strong>fff<em>gg</em>hh<strong>iii</strong>jjj</strong>

In this instance I am only focused with strong tag although there is nested em tags in the beginning. The reason is that the regular expression that matches the content inside the em tag does not matches the desired content inside the strong tag.

Desired selection: fff<em>gg</em>hh<strong>iii</strong>jjj

The above selection is desired due to the strong tag being present inside the main strong tag. The first strong tag i.e. <strong>ccc<em>ddddd</em></strong> is ignored as it is contained inside the em tag. I only want the content if the string has a nested tag of the same type.

I wrote a few regular expressions but the closest I could get was by using a regular expression: /(?<=<strong>)(?!\w*<\/strong>).*?<strong>.*?<\/strong>.*?(?=<\/strong>)/g.

But this will work if the closing strong tag has only word characters before it. I mean this works on the following string: <em>a<em>bbb<strong>ccc</strong>eeee</em></em><strong>fff<em>gg</em>hh<strong>iii</strong>jjj</strong>.

But this does not work on the string: <em>a<em>bbb<strong>ccc<em>ddddd</em></strong>eeee</em></em><strong>fff<em>gg</em>hh<strong>iii</strong>jjj</strong>. It is obvious that the reason is due to the presence of non word characters before the closing strong tag. So, I tried to replace \w* with .*? to match any character before the closing strong tag, but this did not work.

Upvotes: 1

Views: 58

Answers (1)

Armali
Armali

Reputation: 19375

… the closest I could get was by using a regular expression: /(?<=<strong>)(?!\w*<\/strong>).*?<strong>.*?<\/strong>.*?(?=<\/strong>)/g.

… only one level of nesting like in the example is the level of nesting I need to handle.

The part in your expression that doesn't work right is (?!\w*<\/strong>).*?. We want to bar a closing strong tag herein; this can be achieved by replacing that part with ((?!<\/strong>).)*.

for (x of ['<em>a<em>bbb<strong>ccc<em>ddddd</em></strong>eeee</em></em><strong>fff<em>gg</em>hh<strong>iii</strong>jjj</strong>',
           '<em>a<em>bbb<strong>ccc</strong>eeee</em></em><strong>fff<em>gg</em>hh<strong>iii</strong>jjj</strong>',
           '<em>a<em>bbb<strong>ccc<em>ddddd</em></strong>eeee</em></em><strong>fff<em>gg</em>hh<strong>iii</strong>jjj</strong>'])
    console.log(x.match(/(?<=<strong>)((?!<\/strong>).)*<strong>.*?<\/strong>.*?(?=<\/strong>)/g))

Upvotes: 0

Related Questions