user2233501
user2233501

Reputation: 51

When getting data using HTML AGILITY PACK in C# sometimes the app gets data?

I have been getting some data using the HTML AGILITY PACK to get data from a webpage by selecting some tags, but I have a bug or something. Sometimes the web page times out and the app doesn't get data. How should I get rid of this because I need to refresh the page again and again?

Here's my code:

string Url = "http://gmail.com";
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load(Url);
var SpanNodes = doc.DocumentNode.SelectNodes("//div[@class='form-field wide-80 
normal']");

How can I refresh the web page in the browser using C# or how can I get data using HTML AGILITY PACK by opening a web page in a web browser? ....please guide me in how to do this.

Upvotes: 0

Views: 942

Answers (2)

Mr47
Mr47

Reputation: 2655

Building a so-called scraper and refreshing a webpage very frequently might cause a temporary ban to avoid burdening the server(s) too much.

Upvotes: 0

David
David

Reputation: 16277

HTML AGILITY PACK is good at parsing data from a webpage, if you want to automate/control (e.g. navigate, refresh etc.) a webpage, consider using Selenium.

IWebDriver driver = new OpenQA.Selenium.Firefox.FirefoxDriver();
driver.Navigate().GoToUrl(url);   
driver.Navigate().Refresh();      // <--- here it gets refreshed

BTW: your frequent refresh/crawl data from an url, it is in most cases infeasible, and as your traffic grows, most web site will prompte you to enter CAPTCHA, and it will be a hard time for you to further grab data therein. This though might be off topic, :)

Upvotes: 1

Related Questions