Reputation: 10069
I have defined a Django task (it gets launched using ./manage.py task_name
). This task reads a set of objects from the database and performs an operation (usually sending a ping) on each of them, writing each individual result back to the database.
Currently I have a plain for
loop, but it's obviously too slow, because it waits for each ping to end to start with the next one. So my question here is, what's the best way of parallelizing the operations?
As far as I've read, the best way I've found is using Pool
from the multiprocessing
module, something like the code in this answer.
Upvotes: 1
Views: 110
Reputation: 19114
For your task, which appears pretty simple, multiprocessing
is probably the easiest approach, if only because it's already part of the stdlib. You could do it something like this (untested!):
def run_process(record):
result = ping(record)
pool = Pool(processes=10)
results = pool.map_async(run_process, [records])
for r in results.get():
write_to_database(r)
Upvotes: 1
Reputation: 4467
I would simply recommend celery.
Write celery tasks for operations which you want to be executed parallelizing/async. Let celery handle the concurrency, and you own code can get rid of the mess process management.
Upvotes: 1
Reputation: 11396
I'd say that the best tool would be some event-driven networking engine like twisted library
unlike multi threading / multi processing solutions, event-driven networking engines shine when it comes to intense io operations, without context switching and waiting for block operation they use the system resources in the most efficient way.
one way to use twisted library is to write a scrapy spider that will handle both external network calls like those ping requests you mentioned as well as writing back the response to the database.
a few guidelines for writing such spider:
once you have this spider written, simply launch it from your django command or straight from the shell:
scrapy crawl <spider name>
Upvotes: 0