Even Mien
Even Mien

Reputation: 45898

Screen scraping over SSL with .NET

What solutions exist for screen scraping a site over SSL for use with .NET?

My use case is that I need to login to a partner website (https), navigate through a dynamic hierarchy, and download a zipped file of reports.

I certainly could use other screen scrapers if there are no good viable options in .NET, either though the framework or OSS.

Upvotes: 9

Views: 2189

Answers (4)

ConsultUtah
ConsultUtah

Reputation: 6809

You can certainly do this with HttpWebRequest, but keeping track of the cookies used for logging in may be non-trivial. I would recommend using watir (ruby) or watin (c#). Both will handle all of that for you.

From the WatiN website, here is an example:

public void SearchForWatiNOnGoogle()
{
 using (IE ie = new IE("http://www.google.com"))
 {
  ie.TextField(Find.ByName("q")).TypeText("WatiN");
  ie.Button(Find.ByName("btnG")).Click();

  Assert.IsTrue(ie.ContainsText("WatiN"));
 }
}

Upvotes: 4

Lance Fisher
Lance Fisher

Reputation: 25813

I've heard of people hosting the browser in their program, and scraping with jQuery. Seems great to me since jQuery is great for searching the DOM.

Upvotes: 2

Colin Pickard
Colin Pickard

Reputation: 46643

The gold standard for screen scraping in .NET is the HTML Agility Pack.

As far as retrieving pages over HTTPS, try this article:

(As mentioned by other answers, you may actually be after automation rather than screen scraping, in which case you may be better off with WatiN, a framework orginally designed for automated web testing, but plenty flexible enough for what you want)

Upvotes: 8

Jeff Moser
Jeff Moser

Reputation: 20053

Perhaps consider WATIN to simulate navigating or WebClient if you can find the items yourself and simulate the logic.

Upvotes: 6

Related Questions