MagTun
MagTun

Reputation: 6195

Regex match a string that doesn't contains a string

I want to replace all the <span...> (including <span id="... and <span class="...) in an html by <span> except if the span starts by <span id="textmarker (for example I don't want to keep this span : <span attr="blah" id="textmarker">)

I've tried the regex proposed here and here, I finally came up with this regex that never returns a <span id="textmarker but somehow it sometimes misses the other spans:

<span(?!.*? id="textmarker).*?">

You can see my (simplified) html here : https://regex101.com/r/yT9jG2/2

Strangely, if I run the regex in notepad++ it returns 3 matches (the three spans in the second paragraph) but regex101 only returns 1 match. Notepad++ and regex101 both miss the span in the first paragraph.

This regex also doesn't return every spans it should( cf the spans with a gray highlights here

<span(?![^>]*? id="textmarker)[^>]*?>

Upvotes: 2

Views: 185

Answers (1)

clarity123
clarity123

Reputation: 2046

Updated: To exclude id="textmarker while including id="anythingelse and all other spans:

(<span(?! *id="textmarker)[^>]*>)

On your posted example at: https://regex101.com/r/yT9jG2/2 , and at the top, choosing version 2, set the fields so:

  • field 1: (<span(?! *id="textmarker)[^>]*>)
  • field 2, (the smaller field that lets you set modifier): g

With your example and choosing version 2, matches 9 and lists them on the right, including empty spans as well as non-id="textmarker such as <span id="YellowType">

Explanation

Field 1:

  • optional: ( and ). An extra outer parenthesis was added to the expression for educational purposes, just for making use of regex101's matched group listing feature to list results on the right pane in addition to the default inline highlighting of matches. When using Notepad++ you can of course omit these outer ( ) parentheses.
  • <span: matches <span
  • (?! starts a negative lookahead assertion for the following,
  • * meaning space zero or more times, in case you have extra spaces
  • followed by id="textmarker
  • ) to end the negative lookahead assertion
  • so if the match sees the negative lookahead assertion it automatically discards that as a match
  • [^ starts an exclusion set. so not of of the following, the following being the >
  • ] to stop defining the exclusion
  • * to match the preceding 0 or more times. The preceding being [^>]
  • > to match to end of the open-a-span tag

Field 2

  • g tells regex101 you want this to be a greedy match
  • so the result does not stop at the first match, but will have all matches

Upvotes: 2

Related Questions