Reputation: 31
Here I'm trying to extract one word from an HTML page.
For example, there are two textboxes (1 and 2). now I'm trying to give stackoverflow question ID on textbox1 and get "asked" value on textbox2.
For example, if I give 36 on textbox1 this should give me "9 years, 4 months ago" on textbox2.
WebClient webpage = new WebClient();
String html = webpage.DownloadString("https://stackoverflow.com/questions/" + textBox1.Text);
MatchCollection match = Regex.Matches(html, FILTERHERE, RegexOptions.Singleline);
The problem is I don't know how to filter my output (FILTERHERE)?
Also how can I send my output into textbox2?
Upvotes: 1
Views: 391
Reputation: 6683
With HtmlAgilityPack.
string url = "https://stackoverflow.com/questions/";
var web = new HtmlWeb();
var doc = web.Load(url + textBox1.Text); //the text is "36"
var tag = doc.DocumentNode.SelectSingleNode("//*[@id='qinfo']//td[./p[@class='label-key' and text()='asked']]/following-sibling::td//b");
textBox2.Text = tag.InnerText;
If you don't know XPath, there are browser extensions for Chrome and Firefox that gets the XPath of any Html tag for you (I personally write them manually to make them less sensitive to changes on page structure).
Upvotes: 3
Reputation: 14053
With Windows Forms applicationWebBrowser
control canbe used wthich wpapps the mshtml library and exposes managed HTML DOM
. Example of function which retrieves the asked
text:
private static string GetAskedText(HtmlDocument doc)
{
if (doc == null)
return "document-null";
IEnumerable<mshtml.HTMLDivElement> divs = doc.GetElementsByTagName("div")
.OfType<HtmlElement>()
.Select(e => e.DomElement as mshtml.HTMLDivElement);
foreach (var div in divs)
{
if (string.IsNullOrWhiteSpace(div?.className))
continue;
if (div.className.Trim().ToLower() != "user-info")
continue;
var spans = div.getElementsByTagName("span").OfType<mshtml.HTMLSpanElement>();
foreach (var span in spans)
{
if (string.IsNullOrWhiteSpace(span?.className))
continue;
if (span.className == "relativetime")
{
return span.innerText;
}
}
}
return "not-found";
}
Complete example with Windows Forms application can be downloaded from my dropbox.
Upvotes: 2