Christoffer Molinder
Christoffer Molinder

Reputation: 31

Error when trying to scrape images

I'm trying to download images via URL's stored in a .txt file using Python 3 and I'm getting an error when trying to do so on some websites.This is the error i get:

 File "C:/Scripts/ImageScraper/ImageScraper.py", line 14, in <module>
 dl()
 File "C:/Scripts/ImageScraper/ImageScraper.py", line 10, in dl
 urlretrieve(URL, IMAGE)
 File "C:\Python34\lib\urllib\request.py", line 186, in urlretrieve
 with contextlib.closing(urlopen(url, data)) as fp:
 File "C:\Python34\lib\urllib\request.py", line 161, in urlopen
 return opener.open(url, data, timeout)
 File "C:\Python34\lib\urllib\request.py", line 469, in open
 response = meth(req, response)
 File "C:\Python34\lib\urllib\request.py", line 579, in http_response
 'http', request, response, code, msg, hdrs)
 File "C:\Python34\lib\urllib\request.py", line 507, in error
 return self._call_chain(*args)
 File "C:\Python34\lib\urllib\request.py", line 441, in _call_chain
 result = func(*args)
 File "C:\Python34\lib\urllib\request.py", line 587, in http_error_default
 raise HTTPError(req.full_url, code, msg, hdrs, fp)
 urllib.error.HTTPError: HTTP Error 403: Forbidden

using this code:

from urllib.request import urlretrieve

def dl():
    with open('links.txt', 'r') as input_file:
        for line in input_file:
            URL = line
            IMAGE = URL.rsplit('/',1)[1]
            urlretrieve(URL, IMAGE)


if __name__ == '__main__':
    dl()

I'm assuming its because they do not allow 'bots' to access their website, but with some research I found out there is a way around, atleast when using urlopen, but I cant manage to apply the workaround to my code when I'm using urlretrieve. Is it possible to get this to work?

Upvotes: 3

Views: 249

Answers (1)

gabhijit
gabhijit

Reputation: 3585

I think the error is an actual HTTP Error : 403, saying Access is forbidden to that URL. You might want to try and print the URL before it is accessed and try accessing the URL through your browser. You should also get a forbidden error (403). Learn more about http_status_codes and specifically 403 forbidden

Upvotes: 1

Related Questions