Reputation: 471
I'm trying to get all hex colors from styles attributes in html, but not another hex color values. Really it is a common task I want to understand so I don't want to get other solution, regex only. In other words I need to get substrings with a regex pattern (hex color in this case) from a substring limited by know start and end patterns (style="substring to get values here")
My pattern
(?<=style=").*(#[A-F0-9]{6}).*(?=")
My test html:
<span style="color: #FF0000;background-color: #FFFF99;font-family: Calibri;font-size: 11pt;font-weight: bold;font-style: normal">This shouldn't be in result #FFFF99</span>
<span style="color: #FF0000;background-color: #FFFF99;font-family: Calibri;font-size: 11pt;font-weight: bold;font-style: normal">This shouldn't be in result #FFFF99</span>
I can get only last entry with this pattern, but I need to get all. (so in my example I should get 4 color values: 2 from first span and 2 from second one). How can I achieve it? Thanks in advance!
Upvotes: 0
Views: 46
Reputation: 163217
If a quantifier in a positive lookahead is supported:
(?<=\bstyle="[^"]*)#[A-F0-9]{6}\b(?=[^"]*")
(?<=\bstyle="[^"]*)
Positive lookbehind, assert style="
followed by 0+ occurrences of any char except "
to the left#[A-F0-9]{6}\b
Match #
and 6 times any of the listed chars followed by a word boundary to prevent an empty match(?=[^"]*")
Positive lookahead, assert a 0+ times any char except "
and then match a "
at the right.Note that this matches the word style
and is not bounded to an element.
There are brittle ways to match brackets, but this can easily break.
(?<=<[^<>]*\bstyle="[^"]*)#[A-F0-9]{6}\b(?=[^"<>]*"[^<>]*>)
Upvotes: 1