Reputation: 55
I'm getting an error 403 while I try to do anything to an image's Url (be it get the file size or download it) but I don't get any error while trying to show the image.
I hope I'm clear enough, but if need be this is an example of url posing problem:
Image URL / Site show the image
I'm using this code to get the file size which works great but not on this site for exemple :
public void getFileSize(string uri)
{
try
{
waitGetSize = 0;
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(uri);
req.Timeout = 5000;
req.Method = "HEAD";
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
imgSize = resp.ContentLength;
imgSizeKb = imgSize / 1024;
waitGetSize = 1;
}
catch (Exception ex)
{
MetroMessageBox.Show(this, ex.Message, "Exception :", MessageBoxButtons.OK, MessageBoxIcon.Error);
}
}
As pointed out by cFrozenDeath, I used a HEAD request, so I tried using a GET request to the exact same effect. Same result by simply not stating the request type I want.
So is there a way to get the file size or at least download the file knowing it's shown OK when opened in a browser?
Upvotes: 1
Views: 3018
Reputation: 42434
You have to mimic a webbrowser when you want to scrape content from websites.
Sometimes this means you need to provide and/or keep the Cookies
you get when you land initially on a website, sometimes you have to tell the webserver which page linked to the resource.
In this case you need to provide the Referer
in the header:
public void getFileSize(string uri)
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(uri);
// which page do we want that server to believe we call this from
req.Referer = "http://www.webtoons.com/";
req.Timeout = 5000;
req.Method = "GET"; // or do a HEAD
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
// rest omitted
}
That particular image has a length of 273073
bytes.
Do note that scraping content might be against the terms of service of the particular website. Make sure you don't end up doing illegal stuff.
Upvotes: 3