Reputation: 76547
I currently have a large batch of HTML text and I have several CSS properties that resemble the following:
font:16px/normal Consolas;
font:16px/normal Arial;
font:12px/normal Courier;
which is also bundled with several other CSS properties and other associated HTML values and tags.
I've been trying to write a regular expression that will only grab these "font styles", so if I had the following two paragraphs:
<p style='font:16px/normal Arial; font-weight: x; color: y;'>Stack</p>
<span style='color: z; font:16px/normal Courier;'>Overflow</span>
<br />
<div style='font-family: Segoe UI; font-size: xx-large;'>Really large</div>
it would only match the properties beginning with font:
and ending with a semicolon ;
.
I've played around using RegexHero and the closest I have gotten was:
\b(?:font[\s*\\]*:[\s*\\]*?(\b.*\b);)
which yielded the following results:
font:bold; //Match
font:12pt/normal Arial; //Match
font:16px/normal Consolas; //Match
font:12pt/normal Arial; //Match
property: value; //Not a Match
property: value value value; //Not a Match
but when I attempted to drop in a large block of HTML, things seemed to get muddled and large blocks were selected rather than within the bounds previously specified.
I'll be glad to provide any additional info and test data that I can.
Upvotes: 5
Views: 2052
Reputation: 30273
You've left the .*
greedy, which means it will eat and eat and only stop at the last semicolon available. Add a question mark, i.e. .*?
to make it non-greedy.
Updated:
\b(?:font\s*?:\s*([^;>]*?)(?=[;">}]))
I've tested every example on this page at http://rubular.com/r/yRcED2n6wu.
Upvotes: 4
Reputation: 33908
I'd suggest:
\bfont\s*:\s*([^;}"'<>]+)(?<=\S)
Which will also work for cases where other answers fail. For example:
.foo { font: sans-serif 80% }
... style="font: sans-serif 80%" ...
Upvotes: 1
Reputation: 11182
Try this
\b((?:font:[^;]*?)(?:;|'))
Explanation
\b # Assert position at a word boundary
( # Match the regular expression below and capture its match into backreference number 1
(?: # Match the regular expression below
font: # Match the characters “font:” literally
[^;] # Match any character that is NOT a “;”
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
)
(?: # Match the regular expression below
# Match either the regular expression below (attempting the next alternative only if this one fails)
; # Match the character “;” literally
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
' # Match the character “'” literally
)
)
Upvotes: 5
Reputation: 7761
Try this RegEx:
(?:font:[^;]*);
It matches font:16px/normal Arial;
and font:16px/normal Courier;
from your snippet above.
Upvotes: 2
Reputation: 11
I am not quite sure what you are asking, but I think this problem can be solved by replacing your style tags with CSS. The problem could be solved by placing the following in the Head tag of your HTML.
<style type="text/css">
h1 {
font-family: Arial;
font-size: 15;
font-style:oblique;
}
h2 {
font-family: Courier;
font-size: 16;
font-style:oblique;
}
h3 {
font-family: Segoe UI;
font-size: xx-large;
font-style:oblique;
}
</style>
Now, all you have to do to make an expression (or yourself) set one of these font styles is to surround it with a tag like so:
<h1> Cool Text! </h1>
Good Luck!
Upvotes: 0