Reputation: 16829
I am loading a specific web page in a WebBrowser control. Is there a way to take the following HTML that would be located within this page and save it as a string and trim it down?
Here's an example:
HTML Snippet:
<div class="alertText">26 friends joined</div>
Trimmed:
26
I'm sorry for the very vague description, but I'm not really sure how to word this. Thank you.
Upvotes: 0
Views: 672
Reputation: 2263
Do you mean something like this:
string numberOfFriends;
HtmlElementCollection elems = webBrowser1.Document.GetElementsByTagName( "div" );
foreach( HtmlElement elem in elems )
{
string className = elem.GetAttribute( "className" );
if( !string.IsNullOrEmpty( className ) && "alertText".Equals( className ) )
{
string content = elem.InnerText;
if( Regex.IsMatch( content, "\\d+ friends joined" ) )
{
numberOfFriends = Regex.Match( content, "(\\d+) friends joined" ).Groups[ 1 ].Value;
}
}
}
I am not entirely sure if Regex are totally correct, but the rest should work.
Edit: Changed Groups[ 0 ]
to Groups[ 1 ]
- IIRC first group is entire match.
Edit 2: Changed elem.GetAttribute( "class" )
to elem.GetAttribute( "className" )
- fixed name of attribute and fixed variable name (class
to className
).
Upvotes: 0
Reputation: 780
I would say that this is a better regex match;
html = WebBrowser1.Document.documentElement.OuterHTML
pattern = @'(\d+)\sfriends\sjoined'
for Match m in Regex.Matches(html, pattern) {
friendsJoined = Convert.ToInt32(m.Groups[1].Value)
}
Upvotes: 0
Reputation: 19485
Why not just search the HTML with regex right off the bat, instead of enumerating HtmlElement types?
html = WebBrowser1.Document.documentElement.OuterHTML
pattern = @'<div class="alertText">(\d{1,2}) friends joined</div>'
for Match m in Regex.Matches(html, pattern) {
friendsJoined = Convert.ToInt32(m.Groups[1].Value)
}
If you wanted the scraping to be less dependent on the HTML you could drop the outerbits...
html = WebBrowser1.Document.documentElement.OuterHTML
pattern = @'>(\d{1,2}) friends joined</'
for Match m in Regex.Matches(html, pattern) {
friendsJoined = Convert.ToInt32(m.Groups[1].Value)
}
Upvotes: 1