NateShoffner
NateShoffner

Reputation: 16829

C# - Trim HTML Snippet inside WebBrowser

I am loading a specific web page in a WebBrowser control. Is there a way to take the following HTML that would be located within this page and save it as a string and trim it down?

Here's an example:

HTML Snippet:

<div class="alertText">26 friends joined</div>

Trimmed:

26

I'm sorry for the very vague description, but I'm not really sure how to word this. Thank you.

Upvotes: 0

Views: 672

Answers (3)

Majkel
Majkel

Reputation: 2263

Do you mean something like this:

string numberOfFriends;

HtmlElementCollection elems = webBrowser1.Document.GetElementsByTagName( "div" );
foreach( HtmlElement elem in elems )
{
  string className = elem.GetAttribute( "className" );
  if( !string.IsNullOrEmpty( className ) && "alertText".Equals( className ) )
  {
    string content = elem.InnerText;
    if( Regex.IsMatch( content, "\\d+ friends joined" ) )
    {
      numberOfFriends = Regex.Match( content, "(\\d+) friends joined" ).Groups[ 1 ].Value;
    }
  }
}

I am not entirely sure if Regex are totally correct, but the rest should work.

Edit: Changed Groups[ 0 ] to Groups[ 1 ] - IIRC first group is entire match.

Edit 2: Changed elem.GetAttribute( "class" ) to elem.GetAttribute( "className" ) - fixed name of attribute and fixed variable name (class to className).

Upvotes: 0

Casper Broeren
Casper Broeren

Reputation: 780

I would say that this is a better regex match;

html = WebBrowser1.Document.documentElement.OuterHTML
pattern = @'(\d+)\sfriends\sjoined'
for Match m in Regex.Matches(html, pattern) {
    friendsJoined = Convert.ToInt32(m.Groups[1].Value)
}

Upvotes: 0

T. Stone
T. Stone

Reputation: 19485

Why not just search the HTML with regex right off the bat, instead of enumerating HtmlElement types?

html = WebBrowser1.Document.documentElement.OuterHTML
pattern = @'<div class="alertText">(\d{1,2}) friends joined</div>'
for Match m in Regex.Matches(html, pattern) {
    friendsJoined = Convert.ToInt32(m.Groups[1].Value)
}

If you wanted the scraping to be less dependent on the HTML you could drop the outerbits...

html = WebBrowser1.Document.documentElement.OuterHTML
pattern = @'>(\d{1,2}) friends joined</'
for Match m in Regex.Matches(html, pattern) {
    friendsJoined = Convert.ToInt32(m.Groups[1].Value)
}

Upvotes: 1

Related Questions