smart_beaver
smart_beaver

Reputation: 87

urllib request gives 404 error but works fine in browser

When i try this line:

import urllib.request

urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg")

i get the following error:

Traceback (most recent call last):
  File "scraper.py", line 26, in <module>
    urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg")
  File "/usr/lib/python3.6/urllib/request.py", line 248, in urlretrieve 
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

But the link works fine in my browser? Why does it work in the browser but not for a request? It works with other pictures from the same site.

Upvotes: 2

Views: 1208

Answers (2)

fodma1
fodma1

Reputation: 3535

The request returns

enter image description here

If you check your developer console, It's a 404: enter image description here

So what you see is imgur's custom 404 "page" (which is an image).

EDIT:

So urlretrieve fails on 404 status code. If you want to use the contents of the request (even if the statuscode is 404) you can do the following:

try:
    urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg")
except Exception as e:
    with open("error_photo.jpg", 'wb') as fp:
        fp.write(e.read())

Upvotes: 2

floordiv
floordiv

Reputation: 1

Try to change user-agent. You can just add a kwarg:

req = urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg", headers={"User-Agent": "put custom user agent here"})

Upvotes: 0

Related Questions