Python Tornado connections timing out early -- Any way to prevent timeout (HTTP 599 errors)?

Question

I am using Tornado to asynchronously scrape data from many thousand URLS. Each of them is 5-50MB, so they take a while to download. I keep getting "Exception: HTTP 599: Connection closed http:…" errors, despite the fact that I am setting both connect_timeout and request_timeout to a very large number.

Why, despite the large timeout settings, am I still timing out on some requests after only a few minutes of running the script?* Is there a way to instruct httpclient.AsyncHTTPClient to NEVER time out? Or is there a better solution to prevent timeouts?

The following command is how I'm calling the fetch (each worker calls this request_and_save_url() sub-coroutine in the Worker() coroutine):

@gen.coroutine
def request_and_save_url(url, q_all):
    try:
        response = yield httpclient.AsyncHTTPClient().fetch(url, partial(handle_request, q_all=q_all), connect_timeout = 60*24*3*999999999, request_timeout = 60*24*3*999999999)

    except Exception as e:
        print('Exception: {0} {1}'.format(e, url))
        raise gen.Return([])

kwarunek · Accepted Answer

As you note HTTPError 599 is raised on connection or request timeout, but this is not the only case. The other one is when connection has been closed by the server before request ends (including entire response fetch) e.g. due to its (server) timeout to handle request or whatever.

Python Tornado connections timing out early -- Any way to prevent timeout (HTTP 599 errors)?

Answers (1)

Related Questions