Reputation: 271
i'm trying to write a regex that matches most cases of HTML elements, like for example:
<script></script>
I would like to make an exception for the following HTML tag specifically:
<b>
Which I don't want to capture. Is there a way to do it without using negative lookahead/lookbehind?
At the moment i have something like this:
((\%3C)|<)[^<b]((\%2F)|\/)*[^<\/b][a-z0-9\%\=\'\(\)\ ]+((\%3E)|>)
https://regex101.com/r/ZxkVMJ/2
It does work, but beside
<b>
it also doesn't capture all 1 character tags
(like <a> for example)
as well as longer tags that start with b, like for example
<balloon>
Thank you for any help
Upvotes: 1
Views: 175
Reputation: 521194
As a disclaimer, if you have the availability of any kind of XML/HTML parser, you should really use that for your current problem. If you are forced to use regex here, then consider this pattern:
<([^b][^>]*|b[^>]+)>.*?<\/\1>
This matches an HTML tag which either starts with a letter other than b
, or a tag which does start with b
, but then is followed by one or more other characters (thus ruling out <b>
). Here is a working demo:
Upvotes: 2