Sam Perry
Sam Perry

Reputation: 2614

URLLib2.URL Error: Reading Server Response Codes (Python)

I have a list of urls. I'd like to see the server response code of each and find out if any are broken. I can read server errors (500) and broken links (404) okay, but the code breaks once a non-website is read (e.g. "notawebsite_broken.com"). I've searched around and not found the answer... I hope you can help.

Here's the code:

import urllib2

#List of URLs. The third URL is not a website
urls = ["http://www.google.com","http://www.ebay.com/broken-link",
"http://notawebsite_broken"]

#Empty list to store the output
response_codes = []

# Run "for" loop: get server response code and save results to response_codes
for url in urls:
    try:
        connection = urllib2.urlopen(url)
        response_codes.append(connection.getcode())
        connection.close()
        print url, ' - ', connection.getcode()
    except urllib2.HTTPError, e:
        response_codes.append(e.getcode())
        print url, ' - ', e.getcode()

print response_codes

This gives the output of...

http://www.google.com  -  200
http://www.ebay.com/broken-link  -  404
Traceback (most recent call last):
  File "test.py", line 12, in <module>
    connection = urllib2.urlopen(url)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1214, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1184, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known>

Does anyone know a fix for this or can anyone point me in the right direction?

Upvotes: 2

Views: 2173

Answers (3)

Jonathan Vanasco
Jonathan Vanasco

Reputation: 15690

The API for the urllib2 library is a nightmare.

Many people, myself included, strongly recommend using the requests package:

One of the nicer things about requests is that any request issues inherit from a base Exception class. When you use urllib2 "raw", a number of Exceptions can be raised from urllib2, in addition to the socket module and possibly some others ( i can't remember, but its messy )

tldr -- just use the requests library.

Upvotes: 1

Padraic Cunningham
Padraic Cunningham

Reputation: 180540

You could use requests:

import requests

urls = ["http://www.google.com","http://www.ebay.com/broken-link",
"http://notawebsite_broken"]

for u in urls:
    try:
        r = requests.get(u)
        print "{} {}".format(u,r.status_code)
    except Exception,e:
        print "{} {}".format(u,e)

http://www.google.com 200
http://www.ebay.com/broken-link 404
http://notawebsite_broken HTTPConnectionPool(host='notawebsite_broken', port=80): Max retries exceeded with url: /

Upvotes: 3

Sohcahtoa82
Sohcahtoa82

Reputation: 639

When urllib2.urlopen() fails to connect to the server, or fails to resolve the IP of the host, it raises a URLError instead of HTTPError. You'll need to catch urllib2.URLError in addition to urllib2.HTTPError to deal with those cases.

Upvotes: 1

Related Questions