multiprocessing pool.map not processing list in order

Question

I have this script to process some urls in parallel:

import multiprocessing
import time

list_of_urls = []

for i in range(1,1000):
    list_of_urls.append('http://example.com/page=' + str(i))

def process_url(url):
    page_processed = url.split('=')[1]
    print 'Processing page %s'% page_processed
    time.sleep(5)

pool = multiprocessing.Pool(processes=4)
pool.map(process_url, list_of_urls)

The list is ordered, but when I run it, the script doesn't pick urls from list in order:

Processing page 1
Processing page 64
Processing page 127
Processing page 190
Processing page 65
Processing page 2
Processing page 128
Processing page 191

Instead, I would like it to process page 1,2,3,4 at first, then continue following the order in the list. Is there an option to do this?

grzgrzgrz3 · Accepted Answer

If you do not pass argument chunksize, then map will calculate chunks using this algorithm:

chunksize, extra = divmod(len(iterable), len(self._pool) * 4)
if extra:
   chunksize += 1

It's cutting your iterable into task_batches and running it on separate processes. That is why it's not in order. The solution is to declare the chunk size equal to 1.

import multiprocessing
import time

list_test = range(10)

def process(task):
    print "task:", task
    time.sleep(1)

pool = multiprocessing.Pool(processes=3)
pool.map(process, list_test, chunksize=1)

task: 0
task: 1
task: 2
task: 3
task: 4
task: 5
task: 6
task: 7
task: 8
task: 9

multiprocessing pool.map not processing list in order

Answers (2)

Related Questions