Kappers
Kappers

Reputation: 1331

Regex to Match HTML Style Properties

In need of a regex master here!

<img src="\img.gif" style="float:left; border:0" />
<img src="\img.gif" style="border:0; float:right" />

Given the above HTML, I need a regex pattern that will match "float:right" or "float:left" but only on an img tag.

Thanks in advance!

Upvotes: 0

Views: 4113

Answers (3)

brianary
brianary

Reputation: 9332

I agree with Sean Nyman, it's best not to use a regex (at least not for anything permanent). For something ad-hoc and a bit more durable, you might try:

/<img\s(?:\s*\w+\s*=\s*(?:'[^']*'|"[^"]*"))*?\s*\bstyle\s*=\s*(?:"[^"]*?\bfloat\s*:\s*(\w+)|'[^']*?float\s*:\s*(\w+)/i

Upvotes: 0

Sean
Sean

Reputation: 4470

You really shouldn't use regex to parse html or xml, it's impossible to design a foolproof regex that will handle all corner cases. Instead, I would suggest finding an html-parsing library for your language of choice.

That said, here's a possible solution using regex.

<img\s[^>]*?style\s*=\s*".*?(?<"|;)(float:.*?)(?=;|").*?"

The "float:" will be captured in the only capturing group there, which should be number 1.

The regex basically matches the start of an img tag, followed by any type of character that isn't a close bracket any number of times, followed by the style attribute. Within the style attribute's value, the float: can be anywhere within the attribute, but it should only match the actual float style (i.e. it's preceded by the start of the attribute or a semicolon and followed by a semicolon or the end of the attribute).

Upvotes: 2

chaos
chaos

Reputation: 124365

/<img\s[^>]*style\s*=\s*"[^"]*\bfloat\s*:\s*(left|right)[^"]*"/i

Have to advise you, though: in my experience, no matter what regex you write, someone will be able to come up with valid HTML that breaks it. If you really want to do this in a general, reliable way, you need to parse the HTML, not throw regexes at it.

Upvotes: 4

Related Questions