user3822497
user3822497

Reputation: 43

Python urllib2 does not respect timeout

The following two lines of code hangs forever:

import urllib2
urllib2.urlopen('https://www.5giay.vn/', timeout=5)

This is with python2.7, and I have no http_proxy or any other env variables set. Any other website works fine. I can also wget the site without any issue. What could be the issue?

Upvotes: 4

Views: 2671

Answers (1)

unutbu
unutbu

Reputation: 879143

If you run

import urllib2

url = 'https://www.5giay.vn/'
urllib2.urlopen(url, timeout=1.0)

wait for a few seconds, and then use C-c to interrupt the program, you'll see

  File "/usr/lib/python2.7/ssl.py", line 260, in read
    return self._sslobj.read(len)
KeyboardInterrupt

This shows that the program is hanging on self._sslobj.read(len).

SSL timeouts raise socket.timeout.

You can control the delay before socket.timeout is raised by calling socket.setdefaulttimeout(1.0).

For example,

import urllib2
import socket

socket.setdefaulttimeout(1.0)
url = 'https://www.5giay.vn/'
try:
    urllib2.urlopen(url, timeout=1.0)
except IOError as err:
    print('timeout')

% time script.py
timeout

real    0m3.629s
user    0m0.020s
sys 0m0.024s

Note that the requests module succeeds here although urllib2 did not:

import requests
r = requests.get('https://www.5giay.vn/')

How to enforce a timeout on the entire function call:

socket.setdefaulttimeout only affects how long Python waits before an exception is raised if the server has not issued a response.

Neither it nor urlopen(..., timeout=...) enforce a time limit on the entire function call.

To do that, you could use eventlets, as shown here.

If you don't want to install eventlets, you could use multiprocessing from the standard library; though this solution will not scale as well as an asynchronous solution such as the one eventlets provides.

import urllib2
import socket
import multiprocessing as mp

def timeout(t, cmd, *args, **kwds):
    pool = mp.Pool(processes=1)
    result = pool.apply_async(cmd, args=args, kwds=kwds)
    try:
        retval = result.get(timeout=t)
    except mp.TimeoutError as err:
        pool.terminate()
        pool.join()
        raise
    else:
        return retval

def open(url):
    response = urllib2.urlopen(url)
    print(response)

url = 'https://www.5giay.vn/'
try:
    timeout(5, open, url)
except mp.TimeoutError as err:
    print('timeout')

Running this will either succeed or timeout in about 5 seconds of wall clock time.

Upvotes: 5

Related Questions