Greg
Greg

Reputation: 34798

C# code for saving an entire web page? (with images/formatting)

I've been struggling to find an exmample of some C# code (I'm using C# Visual Studio 2008 Express) that can programmatically save an entire web page (given a URL) including the images and formatting (e.g. CSS). The intention is that in a subsequent phase I'd ship this off (not sure how yet) so it could be viewed later via a browser.

Is there an example of the most simple approach (leveraging the .NET Framework methods) to save an entire web page? Saving as one page with a subdirectory for images, or otherwise. Basically the same as what you get with browsers when you say "save entire web page".

Upvotes: 5

Views: 14536

Answers (3)

Ash
Ash

Reputation: 62096

The simplest way is probably to add a WebBrowser Control to your application and point it at the page you want to save using the Navigate() method.

Then, when the document has loaded, call the ShowSaveAsDialog method. The user can then save the page as a single file, or a file with images in a subdirectory.

[Update]

Having now noticed "programatically" in your question, the above approach is not ideal as it requires either user involvement or delving into the Windows API to send input using SendKeys or similar.

There is nothing built-in to the .NET Framework that does all of what you ask.

So my approach revised would be:

  • Use System.NET.HttpWebRequest to get the main HTML document as a string or stream (easy).
  • Load this into a HTMLAgilityPack document where you can now easily query the document to get lists of all image elements, stylesheet links, etc.
  • Then make a separate web request for each of these files and save them to a subdirectory.
  • Finally update all relevent links in the main page to point to the items in the subdirectory.

In effect you would be implementing a very simple web browser. You may run into issues with pages that use JavaScript to dynamically alter or request page content, but for most pages this should give acceptable results.

Upvotes: 6

STW
STW

Reputation: 46366

It's definitely not elegant, but you could navigate a System.Windows.Forms.WebBrowser to the URL and then call its ShowSaveAsDiagog() method to save the page.

Upvotes: 0

Tzury Bar Yochay
Tzury Bar Yochay

Reputation: 9004

From code Project: ZetaWebSpider

Upvotes: 1

Related Questions