Reputation: 12462
currently, I have about ~5000 jobs that i have dispatch and im dispatching in a loop:
for i, job in enumerate(Jobs):
res = process_job.apply_async(args=[job], queue='job_queue')
it took about 18 seconds to complete the loop
i've tried sending them all as a group() call but it seems to be slow as well.
any suggestion on how to dispatch multiple jobs fast?
ALSO, i've tried to parallelize dispatch via multiprocessing but the overhead of the thread/process seems to negate the benefit as well
Upvotes: 3
Views: 773
Reputation: 1
maybe you should try celery-dispatcher, which use an independent thread to dispatch tasks.
you can yield subtasks in the main task and handle each result within another function:
def handle_result(root_id, task_id, retval, **kwargs):
print(retval)
@shared_task
def sqrt(i):
return i * i
@dispatch(receiver=handle_result)
@shared_task
def calc():
for i in range(10):
yield sqrt, (i,)
Upvotes: 0
Reputation: 3542
It would depend on how Jobs
is retrieved, but we handle this using a dispatcher task and then we can just call the dispatcher task.
@task
def process_job(job):
# do stuff for this job
@task
def dispatcher():
for job in Jobs:
process_job.apply_async(args=[job], queue='job_queue')
Upvotes: 2