lerxst3
lerxst3

Reputation: 23

Getting 403 Exception fetching web page programmatically even though web page is available via browser

I'm trying to fetch the HTML of a page through code:

WebRequest r = WebRequest.Create(szPageURL);
WebClient client = new WebClient();
try
{
    WebResponse resp = r.GetResponse();
    StreamReader sr = new StreamReader(resp.GetResponseStream());
    szHTML = sr.ReadToEnd();
}

This code works when I use URLs like www.microsoft.com, www.google.com, or www.nasa.gov. However, when I put in www.epa.gov (using either 'http' or 'https' in the URL parameter), I get a 403 exception when executing r.GetResponse(). Yet I can easily fetch the page manually in a browser. The exception I'm getting is 403 (Forbidden) and the exception status member says "ProtocolError". What does that mean? Why I am I getting this on a page that actually is available? Anyone have any ideas? Thanks!

BTW - I also tried this way:

string downloadString = client.DownloadString(szPageURL);

Got exact same exception.

Upvotes: 1

Views: 1190

Answers (1)

Hossein Golshani
Hossein Golshani

Reputation: 1897

try this code, it works:

string Url = "https://www.epa.gov/";
CookieContainer cookieJar = new CookieContainer();
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(Url);
request.CookieContainer = cookieJar;
request.Accept = @"text/html, application/xhtml+xml, */*";
request.Referer = @"https://www.epa.gov/";
request.Headers.Add("Accept-Language", "en-GB");
request.UserAgent = @"Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Trident/6.0)";
request.Host = @"www.epa.gov";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
String htmlString;
using (var reader = new StreamReader(response.GetResponseStream()))
{
htmlString = reader.ReadToEnd();
}

Upvotes: 1

Related Questions