lance-p
lance-p

Reputation: 1070

C# Regular Expression Find Variable Pattern

I need to parse html that is formatted in the manner of the code sample below. The issue I have is that the field name can be wrapped in tags that have variable background or color styles. The pattern I am looking for is
tag, ignore any span that wraps text followed by a colon (this is the pattern
id: without an span tag wrapping). Matching this pattern should give me the key name and whatever follows the key name is the key value, until the next key name is hit. Below is a sample of the html I need to parse.

string source = "
<br />id: Value here
        <br /><SPAN style=\"background-color: #A0FFFF; color: #000000\">community</SPAN>: Value here
        <br /><SPAN style=\"background-color: #A0FFFF; color: #000000\">content</SPAN><SPAN style=\"background-          color: #A0FFFF; color: #000000\">title</SPAN>: Value here
"
//split the source into key value pairs based on the pattern match.

Thanks for any help.

Upvotes: 0

Views: 475

Answers (1)

Steve Ruble
Steve Ruble

Reputation: 3895

Here's some code that'll parse it, assuming that your example HTML should have another <br /> element after `content'.

string source = @"
  <br />id: Value here
  <br /><SPAN style=""background-color: #A0FFFF; color: #000000"">community</SPAN>: Value here
  <br /><SPAN style=""background-color: #A0FFFF; color: #000000"">content</SPAN>
  <br /><SPAN style=""background-color: #A0FFFF; color: #000000"">title</SPAN>: Value here";

var items = Regex.Matches(source,@"<br />(?:<SPAN[^>]*>)?([^<:]+)(?:</SPAN>)?:?\s?(.*)")
         .OfType<Match>()
         .ToDictionary (m => m.Groups[1].Value, m => m.Groups[2].Value)
         .ToList();

Upvotes: 2

Related Questions