Reputation: 337
I am using Tornado to asynchronously scrape data from many thousand URLS. Each of them is 5-50MB, so they take a while to download. I keep getting "Exception: HTTP 599: Connection closed http:…" errors, despite the fact that I am setting both connect_timeout and request_timeout to a very large number.
Why, despite the large timeout settings, am I still timing out on some requests after only a few minutes of running the script?* Is there a way to instruct httpclient.AsyncHTTPClient to NEVER time out? Or is there a better solution to prevent timeouts?
The following command is how I'm calling the fetch (each worker calls this request_and_save_url() sub-coroutine in the Worker() coroutine):
@gen.coroutine
def request_and_save_url(url, q_all):
try:
response = yield httpclient.AsyncHTTPClient().fetch(url, partial(handle_request, q_all=q_all), connect_timeout = 60*24*3*999999999, request_timeout = 60*24*3*999999999)
except Exception as e:
print('Exception: {0} {1}'.format(e, url))
raise gen.Return([])
Upvotes: 0
Views: 1256
Reputation: 12587
As you note HTTPError 599
is raised on connection or request timeout, but this is not the only case. The other one is when connection has been closed by the server before request ends (including entire response fetch) e.g. due to its (server) timeout to handle request or whatever.
Upvotes: 1