Dantuzzo
Dantuzzo

Reputation: 271

regex match expression except specific string (no negative lookahead)

i'm trying to write a regex that matches most cases of HTML elements, like for example:

<script></script>

I would like to make an exception for the following HTML tag specifically:

<b> 

Which I don't want to capture. Is there a way to do it without using negative lookahead/lookbehind?

At the moment i have something like this:

((\%3C)|<)[^<b]((\%2F)|\/)*[^<\/b][a-z0-9\%\=\'\(\)\ ]+((\%3E)|>)

https://regex101.com/r/ZxkVMJ/2

It does work, but beside

<b> 

it also doesn't capture all 1 character tags

(like <a> for example) 

as well as longer tags that start with b, like for example

<balloon>

Thank you for any help

Upvotes: 1

Views: 175

Answers (1)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521194

As a disclaimer, if you have the availability of any kind of XML/HTML parser, you should really use that for your current problem. If you are forced to use regex here, then consider this pattern:

<([^b][^>]*|b[^>]+)>.*?<\/\1>

This matches an HTML tag which either starts with a letter other than b, or a tag which does start with b, but then is followed by one or more other characters (thus ruling out <b>). Here is a working demo:

Demo

Upvotes: 2

Related Questions