streethacker
streethacker

Reputation: 328

Python hold on the HTTP connection before sending files?

Problem

This problem has bothered me quite a long time. I'm dealing with a web API which provides the function that query the database by some specific conditions, fetch the qualified data back, and generate an .xls file for download.

However, the data amount is really really large, so generate the .xls file will relatively cost a long time. This may cause an HTTP Timeout. I used to created a generator to yield the records line by line formatted by .csv. It works well on the aspect of performance(I mean fast to generate and download), with some side effects, however. As I mentioned in the previous two questions:

Qeustions

After a serious consideration, I finally decide to generate the whole .xls file on the server side, and then provide for download. But how can I maintain the http connection during the time for generating the .xls file?

Upvotes: 1

Views: 249

Answers (2)

reptilicus
reptilicus

Reputation: 10397

I agree with @Jan, server sent events (SSE) are probably the way to go. If you want to get more fancy, you could set up a celery task queue and listen for a task_complete signal, then notify the user via a SSE that the download is ready. Here is an example of using SSE in Flask. And here is a link to celery and signaling

Another way to do it would be to start an async Celery task in the initial request, then keep checking if the task is completed via an ajax request in a setInterval() client side. The route would just check the MyTask.AsyncResult(task_id).state

Upvotes: 2

Dr. Jan-Philip Gehrcke
Dr. Jan-Philip Gehrcke

Reputation: 35776

"How can I maintain the http connection during the time for generating the .xls file?"

Simple answer: you cannot. A least you cannot guarantee that a single simple HTTP GET request (and the underlying TCP connection) work reliably. Depending on the client specifics and the network the client is inside, your users might often experience errors (connection timeouts which your application does not handle).

So, the right question is: which technology do you need to let users get this file, independent of how long it takes to generate and of how bad their Internet connection is?

There are many possible approaches, and all of them have their disadvantages. Depending on which browsers you want to support, there are a couple of options. All of them require client-side JavaScript usage.

You might want to use the modern Server-Sent events, which allows the server to actively trigger an event in the browser, to which the browser can respond as desired.

A more classical approach would be (long) polling over HTTPS, where you do as before, but configure the timeout times in client as well as server to be quite large. Additionally, you need to have JavaScript in place that just repeats the request in case it has timed out. Also, there are dirty techniques established for preventing a timeout.

You might want to have to do some research, using the terms "server push", "comet", "long polling". Doing so, you will probably read about WebSockets (which you do not directly need in my opinion).

I guess if I were you I would now choose to use Server-sent events. But you have to figure this out yourself, depending on your exact requirements.

By a quick glance, the introduction to this article may be a good read: https://jersey.java.net/documentation/latest/sse.html

Also, the introduction of the W3C Server-Sent Events specification is nice. Quote:

Event streams requests can be redirected using HTTP 301 and 307 redirects as with normal HTTP requests. Clients will reconnect if the connection is closed; a client can be told to stop reconnecting using the HTTP 204 No Content response code.

Using this API rather than emulating it using XMLHttpRequest or an iframe allows the user agent to make better use of network resources in cases where the user agent implementor and the network operator are able to coordinate in advance. Amongst other benefits, this can result in significant savings in battery life on portable devices. This is discussed further in the section below on connectionless push.

Upvotes: 2

Related Questions