Reputation: 77
I am have a small crawler and I am extracting a web-page content of a simple page.
def url2dict(url):
'''
DOCSTRING: converts two-column data into a dictionary with first column as a key.
INPUT: URL address as a string
OUTPUT: dictionary with one key and one value
'''
with urlopen(url) as page:
page_raw = page.read()
...
Now this function calls the server at url. The problem is the server has generated 504 Error
File "C:\Python38\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Python38\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Python38\lib\urllib\request.py", line 640, in http_response
response = self.parent.error(
File "C:\Python38\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Python38\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "C:\Python38\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 504: Gateway Time-out
My problem is I cannot find what is the default value of urlopen timeout.
Here https://bugs.python.org/issue18417 it is said that there is no timeout (timeout = None) by default (at least for Python 3.4 version):
OK, I reviewed the issue enough to remember: If socket.setdefaulttimeout is never called, then the default timeout is None (no timeout).
What is the current state for 3.8?
If there is no timeout set, why I got this error of Error 504?
More details:
One of the errors shows error in
File "C:\Python38\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
I open the file and I have read:
def urlopen(url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT, *, cafile=None, capath=None, cadefault=False, context=None): '''Open the URL url, which can be either a string or a Request object.
*data* must be an object specifying additional data to be sent to
the server, or None if no such data is needed. See Request for
details.
urllib.request module uses HTTP/1.1 and includes a "Connection:close"
header in its HTTP requests.
The optional *timeout* parameter specifies a timeout in seconds for
blocking operations like the connection attempt (if not specified, the
global default timeout setting will be used). This only works for HTTP,
HTTPS and FTP connections.
So does (if not specified, the global default timeout setting will be used) mean that if I have a global variable defined called timeout, it would be used as a timeout duration?
Upvotes: 1
Views: 4143
Reputation: 9664
Your research was actually correct, default timeout is determined from socket._GLOBAL_DEFAULT_TIMEOUT
. To learn its value, you can use socket.getdefaulttimeout()
:
Return the default timeout in seconds (
float
) for new socket objects. A value ofNone
indicates that new socket objects have no timeout. When the socket module is first imported, the default isNone
.
TL;DR by default there is no timeout set.
The timeout you're seeing is a response from the server as per the RFC:
The 504 (Gateway Timeout) status code indicates that the server, while acting as a gateway or proxy, did not receive a timely response from an upstream server it needed to access in order to complete the request.
The machine you're making a request against is a front that can be for instance a (reverse) proxy serving content from another server, e.g. to balance loadbalance between multiple back-end servers.
Being on the server side, you cannot do much about it except for perhaps catch the problem and if you know / believe it is intermittent re-try.
Upvotes: 1