Reputation: 51
I have been getting some data using the HTML AGILITY PACK to get data from a webpage by selecting some tags, but I have a bug or something. Sometimes the web page times out and the app doesn't get data. How should I get rid of this because I need to refresh the page again and again?
Here's my code:
string Url = "http://gmail.com";
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load(Url);
var SpanNodes = doc.DocumentNode.SelectNodes("//div[@class='form-field wide-80
normal']");
How can I refresh the web page in the browser using C# or how can I get data using HTML AGILITY PACK by opening a web page in a web browser? ....please guide me in how to do this.
Upvotes: 0
Views: 942
Reputation: 2655
Building a so-called scraper and refreshing a webpage very frequently might cause a temporary ban to avoid burdening the server(s) too much.
Upvotes: 0
Reputation: 16277
HTML AGILITY PACK is good at parsing data from a webpage, if you want to automate/control (e.g. navigate, refresh etc.) a webpage, consider using Selenium.
IWebDriver driver = new OpenQA.Selenium.Firefox.FirefoxDriver();
driver.Navigate().GoToUrl(url);
driver.Navigate().Refresh(); // <--- here it gets refreshed
BTW: your frequent refresh/crawl data from an url, it is in most cases infeasible, and as your traffic grows, most web site will prompte you to enter CAPTCHA, and it will be a hard time for you to further grab data therein. This though might be off topic, :)
Upvotes: 1