samy
samy

Reputation: 14962

Regular expression: finding two elements not surrounding another element in text

I need to find badly formatted HTML content from some text; we let users add strong and em tags but they don't always close them correctly

This is some <b>correct</b> formatting
This is some <b>incorrect<b> formatting

I would like to catch instances where the formatting is incorrect, ie where an opening tag is not followed by a closing tag. I started using negative lookaheads but have had not much success so far

<b>(?!.*?<\/b>.*?)<b>

Any idea how I could do that?

Addendum: I know about Tony the pony, but I feel it is not coming right now. This problem could be replaced by "I want to find two occurences of a word "zoinx" where there is no occurence of the word "palantir" in between" which is not HTML-related

Upvotes: 5

Views: 132

Answers (1)

vks
vks

Reputation: 67968

<b>(?:(?!<\/b>).)*<b>

Try this.See demo.

https://regex101.com/r/nS2lT4/19

For a generalized version use

<([^>]*)>(?:(?!<\/\1>).)*<\1>

See demo.

https://regex101.com/r/nS2lT4/24

Upvotes: 3

Related Questions