FunLovinCoder
FunLovinCoder

Reputation: 7867

Django: Should I kick off a separate process?

I'm writing an app that will allow the user to upload data in a file; the app will process this data, and email the results to the user. Processing may take some time, so I would like to handle this separately in a Python script rather than wait in the view for it to complete. The Python script and view don't need to communicate as the script will pick up the data from a file written by the view. The view will just put up a message like "Thanks for uploading your data - the results will be emailed to you"

What's the best way to do this in Django? Spawn off a separate process? Put something on a queue?

Some example code would be greatly appreciated. Thanks.

Upvotes: 10

Views: 5293

Answers (4)

Skylar Saveland
Skylar Saveland

Reputation: 11434

You could use multiprocessing. http://docs.python.org/library/multiprocessing.html

Essentially,

def _pony_express(objs, action, user, foo=None):
    # unleash the beasts

def bulk_action(request, t):

    ...
    objs = model.objects.filter(pk__in=pks)

    if request.method == 'POST':
        objs.update(is_processing=True)

        from multiprocessing import Process
        p = Process(target=_pony_express, args=(objs, action, request.user), kwargs={'foo': foo})
        p.start()

        return HttpResponseRedirect(next_url)

    context = {'t': t, 'action': action, 'objs': objs, 'model': model}
    return render_to_response(...)

Upvotes: 1

Marc Demierre
Marc Demierre

Reputation: 1108

I currently have a project with similar requirements (just more complicated^^).

Never spawn a subprocess or thread from your Django view. You have no control of the Django processes and it could be killed, paused etc before the end of the task. It is controlled by the web server (e.g. apache via WSGI).

What I would do is an external script, which would run in a separate process. You have two solutions I think :

  • A process that is always running and crawling the directory where you put your files. It would for example check the directory every ten seconds and process the files
  • Same as above, but run by cron every x seconds. This basically has the same effect
  • Use Celery to create worker processes and add jobs to the queue with your Django application. Then you will need to get the results back by one of the means available with Celery.

Now you probably need to access the information in Django models to email the user in the end. Here you have several solutions :

  • Import your modules (models etc) from the external script
  • Implement the external script as a custom command (as knutin suggested)
  • Communicate the results to the Django application via a POST request for example. Then you would do the email sending and status changes etc in a normal Django view.

I would go for an external process and import the modules or POST request. This way it is much more flexible. You could for example make use of the multiprocessing module to process several files in the same time (thus using multi-core machines efficiently).

A basic workflow would be:

  1. Check the directory for new files
  2. For each file (can be parallelized):
    1. Process
    2. Send email or notify your Django application
  3. Sleep for a while

My project contains really CPU-demanding processing. I currently use an external process that gives processing jobs to a pool of worker processes (that's basically what Celery could do for you) and reports the progress and results back to the Django application via POST requests. It works really well and is relatively scalable, but I will soon change it to use Celery on a cluster.

Upvotes: 4

Mike DeSimone
Mike DeSimone

Reputation: 42795

You could spawn a thread to do the processing. It wouldn't really have much to do with Django; the view function would need to kick off the worker thread and that's it.

If you really want a separate process, you'll need the subprocess module. But do you really need to redirect standard I/O or allow external process control?

Example:

from threading import Thread
from MySlowThing import SlowProcessingFunction # or whatever you call it

# ...

Thread(target=SlowProcessingFunction, args=(), kwargs={}).start()

I haven't done a program where I didn't want to track the threads' progress, so I don't know if this works without storing the Thread object somewhere. If you need to do that, it's pretty simple:

allThreads = []

# ...

global allThreads
thread = Thread(target=SlowProcessingFunction, args=(), kwargs={})
thread.start()
allThreads.append(thread)

You can remove threads from the list when thread.is_alive() returns False:

def cull_threads():
    global allThreads
    allThreads = [thread for thread in allThreads if thread.is_alive()]

Upvotes: 3

knutin
knutin

Reputation: 5103

The simplest possible solution is to write a custom commands that searches for all the un-processed files, processes them and then emails the user. The management commands runs inside the Django framework so they have access to all models, db connections, etc, but you can call them from wherever, for example crontab.

If you care about the timeframe between the file has been uploaded and processing starts, you could use a framework like Celery, which is basically a helper library for using a message queue and running workers listening in on the queue. This would be pretty low latency, but on the other hand, simplicity might be more important for you.

I would strongly advice against starting threads or spawning processes in your views, as the threads would be running inside the django process and could destroy your webserver(depending on your configuration). The child process would inherit everything from the Django process, which you probably don't want. It is better to keep this stuff separate.

Upvotes: 20

Related Questions