anton
anton

Reputation: 11

Regex repeated groups

I have this text:

<span id="3">

HELLO THERE
<span id="5">
Other stuff
<span id="6">
Other Stuff
<span id="7">
Other sutff

I need to grab just the <span...> elements after the HELLO THERE text. So in the above example, all the spans except for the one with id=3.

So I tried (<span.+?>)+ which grabs all the spans. Next, I tried HELLO THERE.+?(<span.+?>)+, but that only grabs the first relevant one. So my question is, what is the right regex to use here?

Upvotes: 1

Views: 12160

Answers (2)

Emma
Emma

Reputation: 27723

RegEx 1

Here, we can use several expressions that would get the desired <span> opening tags. For example, we can simply use:

\s(<.+)

with a space boundary on the left and a capturing group which would do that.

enter image description here

Demo


RegEx 2

Another alternative which is more expensive with higher complexity would be:

([\s\S].*?)(<.+>)

enter image description here

Demo

RegEx 3

Then, we can reduce the complexity and improve the performance with this expression:

([\s\S].*?)(<.+>)*

enter image description here

Demo

RegEx Circuit

Here, we can also visualize our expressions in jex.im:

enter image description here

Upvotes: 2

Joanna Derks
Joanna Derks

Reputation: 4063

This regex will capture all the tags after Hello There into the matching groups:

HELLO THERE(?:(?:.*?)(<span[^>]+>))+
  • HELLO THERE - match the beginning
    Inside the non capturing group:
  • (?:.*?) - match optionally any text until you find
  • (<span[^>]+>) - the span tag - this one will be captured
  • + - repeat the previous 2 steps until no other span tags can be found

You also need to set your matching options to dot matches new line.

Upvotes: 0

Related Questions