Eldar
Eldar

Reputation: 5217

Match all specified words within a tag

I want to match following pattern: match all uppercase letter-only words in brackets and inside <b></b> tags.

Example:

(ABC) 'must extract none
<b>(ABC) 'must extract none
<b>(ABC)(CDE)(EFG)</b> 'must extract ABC, CDE and EFG
<b> shr (ABC) апаd (CDE)   lgsgs   </b> 'must extract ABC and CDE
<b>A</b>(ABCA)<b>(ABCB)</b> 'must extract only ABCB
<b>A</b>(ABCA)<b>dada(ABCB)wsg</b> 'must extract only ABCB
<b>AB</b>(ABCA)<b>BC</b>(ABCB) 'must extract none

I tried to use following pattern, but it matches only first occurrence:

"(<b>(?:(?!<\/?b>).)*?\()([A-Z]+)(\)(?:(?!<\/?b>).)*<\/b>)"

Upvotes: 0

Views: 75

Answers (1)

Avinash Raj
Avinash Raj

Reputation: 174696

You could try the below regex.

(?:[A-Z]+(?=\)))(?=(?:(?!<\/?b>).)*<\/b>)
  • (?:[A-Z]+(?=\))) It would match one or more uppercase letters only if it's followed by a closing ) bracket.

  • (?=(?:(?!<\/?b>).)*<\/b>) And aslo it must be followed by any character but not of opening or closing <b> tag zero or more times and then it must be followed by a closing </b> tag.

DEMO

OR

Simply like this,

(?:[A-Z]+(?=\)))(?=[^<>]*<\/b>)

DEMO

Upvotes: 2

Related Questions