sledgehammer
sledgehammer

Reputation: 89

Webpage download

I am having some issues download the source of a webpage, I can view the webpage fine in any browser, I can also run a web spider and download the first page no problem. Whenever I run the code to grab the source of that page I always get 403 forbidden error.

As soon as the request is sent the 403 forbidden error is returned. Anyone have any ideas?

string urlAddress = "http://www.brownells.com/";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();

if (response.StatusCode == HttpStatusCode.OK)
{
      Stream receiveStream = response.GetResponseStream();
      StreamReader readStream = null;

.................................

      response.Close();
      readStream.Close();

Upvotes: 1

Views: 281

Answers (1)

Aydin
Aydin

Reputation: 15294

If you're in a rush...

string uri =  @"http://brownells.com";

HttpWebRequest request         = (HttpWebRequest)WebRequest.Create(uri);
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
request.UserAgent              = @"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36";
request.Accept                 = @"text/html";

using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
using (Stream stream            = response.GetResponseStream())
using (StreamReader reader      = new StreamReader(stream))
{
    Console.WriteLine (reader.ReadToEnd());
}

request.AutomaticDecompression notifies the server that we, the client, support both gzip and Deflate compression schemes, so there'll be some performance gain there, however it isn't needed, the server only required that you have your UserAgent and Accept header set.


The tools for the job...

Remember, if you can do it in a browser, you can do it in C#, the only time you'll seriously struggle is if there's some JavaScript sorcery where the site is setting cookies using JavaScript, it's rare, but it happens.

Back to the topic at hand...

  1. Download Fiddler, it's a web debugging proxy that's simply invaluable when debugging HTTP traffic. Install it and run it.
  2. Navigate to your website of choice.
  3. Check out fiddler to see the request your browser sent then check out what the server responded with...
  4. Replicate it using C#

Link to the image below enter image description here


Edit

If you want to dump to a file, you need to use a filestream

using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
using (Stream stream            = response.GetResponseStream())
using (StreamReader reader      = new StreamReader(stream))
using (TextWriter writer        = new StreamWriter("filePath.html") 
{
    writer.Write(reader.ReadToEnd();
}

Upvotes: 2

Related Questions