Reputation: 7867
I'm writing an app that will allow the user to upload data in a file; the app will process this data, and email the results to the user. Processing may take some time, so I would like to handle this separately in a Python script rather than wait in the view for it to complete. The Python script and view don't need to communicate as the script will pick up the data from a file written by the view. The view will just put up a message like "Thanks for uploading your data - the results will be emailed to you"
What's the best way to do this in Django? Spawn off a separate process? Put something on a queue?
Some example code would be greatly appreciated. Thanks.
Upvotes: 10
Views: 5293
Reputation: 11434
You could use multiprocessing. http://docs.python.org/library/multiprocessing.html
Essentially,
def _pony_express(objs, action, user, foo=None):
# unleash the beasts
def bulk_action(request, t):
...
objs = model.objects.filter(pk__in=pks)
if request.method == 'POST':
objs.update(is_processing=True)
from multiprocessing import Process
p = Process(target=_pony_express, args=(objs, action, request.user), kwargs={'foo': foo})
p.start()
return HttpResponseRedirect(next_url)
context = {'t': t, 'action': action, 'objs': objs, 'model': model}
return render_to_response(...)
Upvotes: 1
Reputation: 1108
I currently have a project with similar requirements (just more complicated^^).
Never spawn a subprocess or thread from your Django view. You have no control of the Django processes and it could be killed, paused etc before the end of the task. It is controlled by the web server (e.g. apache via WSGI).
What I would do is an external script, which would run in a separate process. You have two solutions I think :
Now you probably need to access the information in Django models to email the user in the end. Here you have several solutions :
I would go for an external process and import the modules or POST request. This way it is much more flexible. You could for example make use of the multiprocessing module to process several files in the same time (thus using multi-core machines efficiently).
A basic workflow would be:
My project contains really CPU-demanding processing. I currently use an external process that gives processing jobs to a pool of worker processes (that's basically what Celery could do for you) and reports the progress and results back to the Django application via POST requests. It works really well and is relatively scalable, but I will soon change it to use Celery on a cluster.
Upvotes: 4
Reputation: 42795
You could spawn a thread to do the processing. It wouldn't really have much to do with Django; the view function would need to kick off the worker thread and that's it.
If you really want a separate process, you'll need the subprocess module. But do you really need to redirect standard I/O or allow external process control?
Example:
from threading import Thread
from MySlowThing import SlowProcessingFunction # or whatever you call it
# ...
Thread(target=SlowProcessingFunction, args=(), kwargs={}).start()
I haven't done a program where I didn't want to track the threads' progress, so I don't know if this works without storing the Thread
object somewhere. If you need to do that, it's pretty simple:
allThreads = []
# ...
global allThreads
thread = Thread(target=SlowProcessingFunction, args=(), kwargs={})
thread.start()
allThreads.append(thread)
You can remove threads from the list when thread.is_alive()
returns False
:
def cull_threads():
global allThreads
allThreads = [thread for thread in allThreads if thread.is_alive()]
Upvotes: 3
Reputation: 5103
The simplest possible solution is to write a custom commands that searches for all the un-processed files, processes them and then emails the user. The management commands runs inside the Django framework so they have access to all models, db connections, etc, but you can call them from wherever, for example crontab.
If you care about the timeframe between the file has been uploaded and processing starts, you could use a framework like Celery, which is basically a helper library for using a message queue and running workers listening in on the queue. This would be pretty low latency, but on the other hand, simplicity might be more important for you.
I would strongly advice against starting threads or spawning processes in your views, as the threads would be running inside the django process and could destroy your webserver(depending on your configuration). The child process would inherit everything from the Django process, which you probably don't want. It is better to keep this stuff separate.
Upvotes: 20