bdhar
bdhar

Reputation: 22973

Skip URL if timeout

I have a list of URL's

I am using the following to retrieve their contents:

for url in url_list:
    req = urllib2.Request(url)
    resp = urllib2.urlopen(req, timeout=5)
    resp_page = resp.read()
    print resp_page

When there is a timeout, the program just crashes. I just want to read the next URL if there is a socket.timeout: timed out. How to do this?

Thanks

Upvotes: 2

Views: 6092

Answers (3)

Jir
Jir

Reputation: 3125

Although there already is an answer, I'd like to point out that URLlib2 might not be the sole responsible with this behavior.

As pointed out here (and as it also seems based on the problem description), the exception may belong to the socket library.

In that case just add another except:

import socket

try:
    resp = urllib2.urlopen(req, timeout=5)
except urllib2.URLError:
    print "Bad URL or timeout"
except socket.timeout:
    print "socket timeout"

Upvotes: 7

bigendian
bigendian

Reputation: 818

Sounds like you just need to catch the timeout exception. I don't get a socket.timeout message that you do.

req = urllib2.Request("http://127.0.0.2")
try:
    resp = urllib2.urlopen(req, timeout=5)
except urllib2.URLError:
    print "Timeout!"

Obviously, you need to have a URL that will actually timeout (127.0.0.2 may not on your box).

Upvotes: 1

agf
agf

Reputation: 176740

I'm going to go ahead and assume that by "crashes" you mean "raises a URLError", as described by the urllib2.urlopen docs. See the Errors and Exceptions section of the Python Tutorial.

for url in url_list:
    req = urllib2.Request(url)
    try:
        resp = urllib2.urlopen(req, timeout=5)
    except urllib2.URLError:
        print "Bad URL or timeout"
        continue # skips to the next iteration of the loop
    resp_page = resp.read()
    print resp_page

Upvotes: 1

Related Questions