Reputation: 13
I wanted to remove the nested tags formed after various DOM manipulations from the html string so that the end html string after the manipulation looks clean but reflects the correct behaviour. I am using the regular expression to select the content inside the main tag and replace the nested tags with ""
. The problem is I could not get the regular expression to select the content inside the main tag.
Example html string: <em>a<em>bbb<strong>ccc<em>ddddd</em></strong>eeee</em></em><strong>fff<em>gg</em>hh<strong>iii</strong>jjj</strong>
In this instance I am only focused with strong
tag although there is nested em
tags in the beginning. The reason is that the regular expression that matches the content inside the em
tag does not matches the desired content inside the strong
tag.
Desired selection: fff<em>gg</em>hh<strong>iii</strong>jjj
The above selection is desired due to the strong
tag being present inside the main strong
tag. The first strong tag i.e. <strong>ccc<em>ddddd</em></strong>
is ignored as it is contained inside the em
tag. I only want the content if the string has a nested tag of the same type.
I wrote a few regular expressions but the closest I could get was by using a regular expression: /(?<=<strong>)(?!\w*<\/strong>).*?<strong>.*?<\/strong>.*?(?=<\/strong>)/g
.
But this will work if the closing strong tag has only word characters before it. I mean this works on the following string: <em>a<em>bbb<strong>ccc</strong>eeee</em></em><strong>fff<em>gg</em>hh<strong>iii</strong>jjj</strong>
.
But this does not work on the string: <em>a<em>bbb<strong>ccc<em>ddddd</em></strong>eeee</em></em><strong>fff<em>gg</em>hh<strong>iii</strong>jjj</strong>
. It is obvious that the reason is due to the presence of non word characters before the closing strong
tag. So, I tried to replace \w*
with .*?
to match any character before the closing strong tag, but this did not work.
Upvotes: 1
Views: 58
Reputation: 19375
… the closest I could get was by using a regular expression:
/(?<=<strong>)(?!\w*<\/strong>).*?<strong>.*?<\/strong>.*?(?=<\/strong>)/g
.
… only one level of nesting like in the example is the level of nesting I need to handle.
The part in your expression that doesn't work right is (?!\w*<\/strong>).*?
. We want to bar a closing strong
tag herein; this can be achieved by replacing that part with ((?!<\/strong>).)*
.
for (x of ['<em>a<em>bbb<strong>ccc<em>ddddd</em></strong>eeee</em></em><strong>fff<em>gg</em>hh<strong>iii</strong>jjj</strong>',
'<em>a<em>bbb<strong>ccc</strong>eeee</em></em><strong>fff<em>gg</em>hh<strong>iii</strong>jjj</strong>',
'<em>a<em>bbb<strong>ccc<em>ddddd</em></strong>eeee</em></em><strong>fff<em>gg</em>hh<strong>iii</strong>jjj</strong>'])
console.log(x.match(/(?<=<strong>)((?!<\/strong>).)*<strong>.*?<\/strong>.*?(?=<\/strong>)/g))
Upvotes: 0