Reputation: 1908

Does Django file upload occupy process for duration of upload?

I'm using Django 1.3 behind apache via mod_wsgi (daemon mode, 3 worker processes).

If a user is uploading a file, does that completely occupy one of the processes for the duration of the upload, or can it handle other requests while waiting for chunks of data to become available? If 3 users are uploading 3 files will all new requests be queued until the uploads finish?

Edit: I'm currently using worker mpm, 1 thread per daemon process. I am willing to change my configuration if there is a good reason to do so.

Edit 2: Ideally what I would like is for apache to handle the upload and pass it on to django when all the file has been uploaded. Is this how it behaves by default, and if not is there any configuration I change to make this happen?

Upvotes: 1

Answers (2)

SingleNegationElimination

Reputation: 156238

You said:

Ideally what I would like is for apache to handle the upload and pass it on to django when all the file has been uploaded.

Alas, HTTP doesn't really work that way. The problem is that http request bodies could be files, or they could be message queues, or some other creative use for a (Possibly empty) stream of octets. For one thing, this means that it would be wrong for the application server to decide how to handle the request body; The application might want to read a portion of the request, and if it doesn't pass muster (say, invalid authentication credentials, or any reason), it could close the connection before any bandwidth has been devoted to the invalid request.

This is especially a concern when requests can be of arbitrary, unbounded size (with, for instance, the chunked transfer encoding). The application server can elegantly abstract the request body into a simple file stream, but can't make any better decision about its ultimate fate than that.

For this reason, HTTP servers normally call the application as soon as all of the headers have been read, with the request body ready for streaming, if the application desires.

On the other hand, it is the job of application frameworks to abstract the common uses out, and django does this, allowing you to set a preferred upload directory and maximum file size, as well as a few other options. However, this is still tightly in the grips of the restrictions imposed by considerations of HTTP, as mentioned above. This usually works out well enough, because most servers provide enough bandwidth and worker threads to still use the hardware efficiently. A typical worker pool (of say 5-50 concurrent threads/processes), all servicing file uploads, probably doesn't leave the server machine with any available IO to service other requests anyway.

If your application does not fit into this model, perhaps most requests are actually CPU bound, or generally work in ram, but only a few requests do disk IO for file uploads, then you will need to do some custom tuning anyhow, For instance you might need to refactor your service into an asynchronous framework so that you can efficiently handle thousands to hundreds of thousands of concurrent but slow requests, or just balance different kinds of requests across different application servers. Apache is a good general purpose application server, but it's rarely the fastest for a specific application.

Upvotes: 2

Graham Dumpleton

Reputation: 58563

Since you are using a single threaded mod_wsgi daemon process, then yes the whole daemon process will be occupied for the whole period of the upload. This is because the request content is streamed right through to the Django application and Apache does not pre read the request content before request is passed through to mod_wsgi.

Upvotes: 1

Does Django file upload occupy process for duration of upload?

Answers (2)

Related Questions