Error 403 while trying to download an image, but not to show it

Question

I'm getting an error 403 while I try to do anything to an image's Url (be it get the file size or download it) but I don't get any error while trying to show the image.

I hope I'm clear enough, but if need be this is an example of url posing problem:

Image URL / Site show the image

I'm using this code to get the file size which works great but not on this site for exemple :

public void getFileSize(string uri)
{
    try
    {
        waitGetSize = 0;
        HttpWebRequest req = (HttpWebRequest)WebRequest.Create(uri);
        req.Timeout = 5000;
        req.Method = "HEAD";
        HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
        imgSize = resp.ContentLength;
        imgSizeKb = imgSize / 1024;
        waitGetSize = 1;
    }
    catch (Exception ex)
    {
        MetroMessageBox.Show(this, ex.Message, "Exception :", MessageBoxButtons.OK, MessageBoxIcon.Error);
    }
}

As pointed out by cFrozenDeath, I used a HEAD request, so I tried using a GET request to the exact same effect. Same result by simply not stating the request type I want.

So is there a way to get the file size or at least download the file knowing it's shown OK when opened in a browser?

rene · Accepted Answer

You have to mimic a webbrowser when you want to scrape content from websites.

Sometimes this means you need to provide and/or keep the Cookies you get when you land initially on a website, sometimes you have to tell the webserver which page linked to the resource.

In this case you need to provide the Referer in the header:

public  void getFileSize(string uri)
{
        HttpWebRequest req = (HttpWebRequest)WebRequest.Create(uri);
        // which page do we want that server to believe we call this from
        req.Referer = "http://www.webtoons.com/";

        req.Timeout = 5000;
        req.Method = "GET";  // or do a HEAD    
        HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
        // rest omitted 
}

That particular image has a length of 273073 bytes.

Do note that scraping content might be against the terms of service of the particular website. Make sure you don't end up doing illegal stuff.

Error 403 while trying to download an image, but not to show it

Answers (1)

Related Questions