regex match expression except specific string (no negative lookahead)

Question

i'm trying to write a regex that matches most cases of HTML elements, like for example:

I would like to make an exception for the following HTML tag specifically:

Which I don't want to capture. Is there a way to do it without using negative lookahead/lookbehind?

At the moment i have something like this:

((\%3C)|<)[^)

https://regex101.com/r/ZxkVMJ/2

It does work, but beside

it also doesn't capture all 1 character tags

(like for example)

as well as longer tags that start with b, like for example

Thank you for any help

Tim Biegeleisen · Accepted Answer

As a disclaimer, if you have the availability of any kind of XML/HTML parser, you should really use that for your current problem. If you are forced to use regex here, then consider this pattern:

<([^b][^>]*|b[^>]+)>.*?<\/\1>

This matches an HTML tag which either starts with a letter other than b, or a tag which does start with b, but then is followed by one or more other characters (thus ruling out ). Here is a working demo:

Demo

regex match expression except specific string (no negative lookahead)

Answers (1)

Demo

Related Questions