Lanc
Lanc

Reputation: 878

Getting headers with Python requests library

I am using Python requests library to get the header of html pages and use this to get the encoding. But some of the links the requests fails to get header. For such cases I would like to use the encoding "utf-8". How do I handle such cases? How do I handle error returned by requests.head.

Here is my code:

r = requests.head(link) #how to handle error in case this fails?
charset = r.encoding
if (not charset):
    charset = "utf-8"

Error I am getting when requests fails to get the header :

 File "parsexml.py", line 78, in parsefile
  r = requests.head(link)
 File "/usr/lib/python2.7/dist-packages/requests/api.py", line 74, in head
   return request('head', url, **kwargs)
 File "/usr/lib/python2.7/dist-packages/requests/api.py", line 40, in request
   return s.request(method=method, url=url, **kwargs)
 File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 229, in request
   r.send(prefetch=prefetch)
 File "/usr/lib/python2.7/dist-packages/requests/models.py", line 605, in send
   raise ConnectionError(e)
 requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.standardzilla.com', port=80): Max retries exceeded with url: /2008/08/01/diaries-of-a-freelancer-day-thirty-seven/

Upvotes: 0

Views: 737

Answers (1)

Noel Evans
Noel Evans

Reputation: 8536

You should put your code in a try-except block, catching ConnectionErrors. Like this:

try:
    r = requests.head(link) //how to handle error in case this fails?
    charset = r.encoding
    if (not charset):
      charset = "utf-8"
except requests.exceptions.ConnectionError:
    print 'Unable to access ' + link

Upvotes: 2

Related Questions