Reputation: 45898
What solutions exist for screen scraping a site over SSL for use with .NET?
My use case is that I need to login to a partner website (https), navigate through a dynamic hierarchy, and download a zipped file of reports.
I certainly could use other screen scrapers if there are no good viable options in .NET, either though the framework or OSS.
Upvotes: 9
Views: 2189
Reputation: 6809
You can certainly do this with HttpWebRequest, but keeping track of the cookies used for logging in may be non-trivial. I would recommend using watir (ruby) or watin (c#). Both will handle all of that for you.
From the WatiN website, here is an example:
public void SearchForWatiNOnGoogle()
{
using (IE ie = new IE("http://www.google.com"))
{
ie.TextField(Find.ByName("q")).TypeText("WatiN");
ie.Button(Find.ByName("btnG")).Click();
Assert.IsTrue(ie.ContainsText("WatiN"));
}
}
Upvotes: 4
Reputation: 25813
I've heard of people hosting the browser in their program, and scraping with jQuery. Seems great to me since jQuery is great for searching the DOM.
Upvotes: 2
Reputation: 46643
The gold standard for screen scraping in .NET is the HTML Agility Pack.
As far as retrieving pages over HTTPS, try this article:
(As mentioned by other answers, you may actually be after automation rather than screen scraping, in which case you may be better off with WatiN, a framework orginally designed for automated web testing, but plenty flexible enough for what you want)
Upvotes: 8
Reputation: 20053
Perhaps consider WATIN to simulate navigating or WebClient if you can find the items yourself and simulate the logic.
Upvotes: 6