Yuchen Huang
Yuchen Huang

Reputation: 311

Why when I use multiprocessing it spend more time?

The code I use without multiprocessing is as below, it time it spend is 0:00:03.044280:

def execute_it():

    number = 10000000
    listing_1 = range(number)
    listing_2 = range(number)
    listing_3 = range(number)
    start = datetime.now()
    task(listing_1, listing_2, listing_3)
    print datetime.now() - start

def task(listing_1, listing_2, listing_3):

    for l1, l2, l3 in zip(listing_1, listing_2, listing_3):
        l1 + l2 + l3

I want to use multiprocessing to spend less time, the code I tried is as below:

def execute_it():


    number = 10000000
    listing_1 = list(range(number))
    listing_2 = list(range(number))
    listing_3 = list(range(number))

    params = zip(listing_1, listing_2, listing_3)


    start = datetime.now()
    pool = mp.Pool(processes=5)
    pool.map(task, params)
    pool.close()
    print datetime.now() - start

def task(params):

    params[0] + params[1] + params[2]

it spend 0:00:15.654919 !!!

What is wrong in my code? I am sure the thing they do is same.

Upvotes: 2

Views: 51

Answers (1)

khachik
khachik

Reputation: 28703

The multiprocessing version takes longer because it is effectively the same as the single-process version plus some additional stuff like creating processes and running map.

You can replace zip with itertools.izip and mp.map with mp.imap to get the expected parallelism effect, otherwise all the heavy processing will happen in the main process.

from itertools import izip
...

def execute_it():
    number = 10000000
    listing_1 = list(range(number))
    listing_2 = list(range(number))
    listing_3 = list(range(number))

    params = izip(listing_1, listing_2, listing_3)

    start = datetime.now()
    pool = mp.Pool(processes=5)
    pool.imap(task, params)
    pool.close()
    print datetime.now() - start

Upvotes: 1

Related Questions