Reputation: 10069

Best way of parallelizing functions from django task

I have defined a Django task (it gets launched using ./manage.py task_name). This task reads a set of objects from the database and performs an operation (usually sending a ping) on each of them, writing each individual result back to the database.

Currently I have a plain for loop, but it's obviously too slow, because it waits for each ping to end to start with the next one. So my question here is, what's the best way of parallelizing the operations?

As far as I've read, the best way I've found is using Pool from the multiprocessing module, something like the code in this answer.

Upvotes: 1

Answers (3)

aquavitae

Reputation: 19154

For your task, which appears pretty simple, multiprocessing is probably the easiest approach, if only because it's already part of the stdlib. You could do it something like this (untested!):

def run_process(record):
    result = ping(record)

pool = Pool(processes=10)
results = pool.map_async(run_process, [records])
for r in results.get():
    write_to_database(r)

Upvotes: 1

Qiang Jin

Reputation: 4467

I would simply recommend celery.

Write celery tasks for operations which you want to be executed parallelizing/async. Let celery handle the concurrency, and you own code can get rid of the mess process management.

Upvotes: 1

Guy Gavriely

Reputation: 11396

I'd say that the best tool would be some event-driven networking engine like twisted library

unlike multi threading / multi processing solutions, event-driven networking engines shine when it comes to intense io operations, without context switching and waiting for block operation they use the system resources in the most efficient way.

one way to use twisted library is to write a scrapy spider that will handle both external network calls like those ping requests you mentioned as well as writing back the response to the database.

a few guidelines for writing such spider:

to read spider list of urls from the database see https://gist.github.com/saidimu/1024207
to properly write the responses to the database see Writing items to a MySQL database in Scrapy

once you have this spider written, simply launch it from your django command or straight from the shell:

scrapy crawl <spider name>

Upvotes: 0

Best way of parallelizing functions from django task

Answers (3)

Related Questions