Andrey Kucher
Andrey Kucher

Reputation: 471

How to get all regex matches from substrings only with regex

I'm trying to get all hex colors from styles attributes in html, but not another hex color values. Really it is a common task I want to understand so I don't want to get other solution, regex only. In other words I need to get substrings with a regex pattern (hex color in this case) from a substring limited by know start and end patterns (style="substring to get values here")

My pattern

(?<=style=").*(#[A-F0-9]{6}).*(?=")

My test html:

<span style="color: #FF0000;background-color: #FFFF99;font-family: Calibri;font-size: 11pt;font-weight: bold;font-style: normal">This shouldn't be in result #FFFF99</span>
<span style="color: #FF0000;background-color: #FFFF99;font-family: Calibri;font-size: 11pt;font-weight: bold;font-style: normal">This shouldn't be in result #FFFF99</span>

I can get only last entry with this pattern, but I need to get all. (so in my example I should get 4 color values: 2 from first span and 2 from second one). How can I achieve it? Thanks in advance!

Upvotes: 0

Views: 46

Answers (1)

The fourth bird
The fourth bird

Reputation: 163217

If a quantifier in a positive lookahead is supported:

(?<=\bstyle="[^"]*)#[A-F0-9]{6}\b(?=[^"]*")
  • (?<=\bstyle="[^"]*) Positive lookbehind, assert style=" followed by 0+ occurrences of any char except " to the left
  • #[A-F0-9]{6}\b Match # and 6 times any of the listed chars followed by a word boundary to prevent an empty match
  • (?=[^"]*") Positive lookahead, assert a 0+ times any char except " and then match a " at the right.

Regex demo

Note that this matches the word style and is not bounded to an element.

There are brittle ways to match brackets, but this can easily break.

(?<=<[^<>]*\bstyle="[^"]*)#[A-F0-9]{6}\b(?=[^"<>]*"[^<>]*>)

Upvotes: 1

Related Questions