petko_stankoski
petko_stankoski

Reputation: 10713

Get style attributes with regex from html string

This is my html string:

<p style="opacity: 1; color: #000000; font-weight: bold; font-style: italic; text-decoration: line-through; background-color: #ffffff;">100 gram n!uts</p>

I want to get the font-weight value, if there is one. How do i do this with regex?

Upvotes: 1

Views: 1918

Answers (2)

fubo
fubo

Reputation: 45947

this should solve it

(?<=font-weight: )[0-9A-Za-z]+(?=;)

Explaination:

(?<=font-weight: ) the string previous to the result has to be font-weight:

[0-9A-Za-z]+ the result contains only letters and digits, at least one

(?=;) the first char after the result is a ;

Code:

string Pattern = @"(?<=font-weight: )[0-9A-Za-z]+(?=;)";
string Value = "<p style=\"opacity: 1; color: #000000; font-weight: bold; font-style: italic; text-decoration: line-through; background-color: #ffffff;\">100 gram n!uts</p>";
string Result = Regex.Match(Value, Pattern).Value; //bold

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626825

If you plan to use some HTML parser in future, you might want to have a look at CsQuery. Just install the NuGet package for your solution and use it as shown in my snippet below.

var html = "<p style=\"opacity: 1; color: #000000; font-weight: bold; font-style: italic; text-decoration: line-through; background-color: #ffffff;\">100 gram n!uts</p>";
var cq = CsQuery.CQ.CreateFragment(html);
foreach (var obj in cq.Select("p"))
{
    var style = string.Empty;
    var has_attr = obj.TryGetAttribute("style", out style);
    if (has_attr)
    {
       // Using LINQ and string methods
       var fontweight = style.Split(';').Where(p => p.Trim().StartsWith("font-weight:")).FirstOrDefault();
       if (!string.IsNullOrWhiteSpace(fontweight.Trim()))
           Console.WriteLine(fontweight.Split(':')[1].Trim());
       // Or a regex
       var font_with_regex = Regex.Replace(style, @".*?\bfont-weight:\s*([^;]+).*", "$1", RegexOptions.Singleline);
       Console.WriteLine(font_with_regex);
    }
}

Note that running a regex replacement is quite safe now, since we only have a plain short string, with no optional quotes around, nor tags to care of.

If you need to load an URL, use

var cq = CsQuery.CQ.CreateFromUrl("http://www.example.com");

This is really much safer than using this regex that is hard to read and is likely to fail with a huge input text:

<p\s[^<]*\bstyle="[^<"]*\bfont-weight:\s*([^"<;]+)

Upvotes: 0

Related Questions