efficient method of downloading webpage info

Question

Before you stop reading and suggest HTML Agility (based on the title), I am already using this tool. The problem is this: I have have a webpage that lists a whole bunch of case numbers and has links to the individual case number page. My app already downloads this info and displays it in a datagridview. However in my app I also need information from the individual case number pages (the links).

The problem is I already know it's going to take forever to acquire using HTML agility. To get the case page, it takes about 2 minutes. Code wise I'm feeding HTML agility the HTML code, adding the cell values to an array and parsing out the array indexes I to display in my grid. This is a very large array parse for the number of components on the page.

Any ideas to acquire the main page and specific cells from the linked pages?

HatSoft · Accepted Answer

An example showing how you can use XPath in HmtlAgility

HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(yourHtml);

Example 1 : //The below example will get all div's with class as container foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[@class='container']")) {
Console.Writeline(node.InnerText); }

Example 2 : //The below example will get first div with class as container HtmlNode node in doc.DocumentNode.SelectNodes("//div[@class='container'][1]"))

Console.Writeline(node.InnerText);

You can use Xpath Queries to get the element(s) you want

for XPath syntax and more please use the link http://www.w3schools.com/xpath/xpath_syntax.asp

efficient method of downloading webpage info

Answers (1)

Related Questions