Peter Quill
Peter Quill

Reputation: 55

Error 403 while trying to download an image, but not to show it

I'm getting an error 403 while I try to do anything to an image's Url (be it get the file size or download it) but I don't get any error while trying to show the image.

I hope I'm clear enough, but if need be this is an example of url posing problem:

Image URL / Site show the image

I'm using this code to get the file size which works great but not on this site for exemple :

public void getFileSize(string uri)
{
    try
    {
        waitGetSize = 0;
        HttpWebRequest req = (HttpWebRequest)WebRequest.Create(uri);
        req.Timeout = 5000;
        req.Method = "HEAD";
        HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
        imgSize = resp.ContentLength;
        imgSizeKb = imgSize / 1024;
        waitGetSize = 1;
    }
    catch (Exception ex)
    {
        MetroMessageBox.Show(this, ex.Message, "Exception :", MessageBoxButtons.OK, MessageBoxIcon.Error);
    }
}

As pointed out by cFrozenDeath, I used a HEAD request, so I tried using a GET request to the exact same effect. Same result by simply not stating the request type I want.

So is there a way to get the file size or at least download the file knowing it's shown OK when opened in a browser?

Upvotes: 1

Views: 3018

Answers (1)

rene
rene

Reputation: 42434

You have to mimic a webbrowser when you want to scrape content from websites.

Sometimes this means you need to provide and/or keep the Cookies you get when you land initially on a website, sometimes you have to tell the webserver which page linked to the resource.

In this case you need to provide the Referer in the header:

public  void getFileSize(string uri)
{
        HttpWebRequest req = (HttpWebRequest)WebRequest.Create(uri);
        // which page do we want that server to believe we call this from
        req.Referer = "http://www.webtoons.com/";

        req.Timeout = 5000;
        req.Method = "GET";  // or do a HEAD    
        HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
        // rest omitted 
} 

That particular image has a length of 273073 bytes.

Do note that scraping content might be against the terms of service of the particular website. Make sure you don't end up doing illegal stuff.

Upvotes: 3

Related Questions