Multiprocessing in Python not faster than doing it sequentially

Question

I want to do something parallelly but it always goes slower. I put an example of two code snippets which can be compared. The multiprocessing way needs 12 seconds on my laptop. The sequential way only 3 seconds. I thought multiprocessing is faster. I know that the task in this way does not make any sense but it is just made to compare the two ways. I know bubble sort can be replaced by faster ways.

Thanks.

Multiprocessing way:

from multiprocessing import Process, Manager
import os
import random

myArray = []

for i in range(1000):
    myArray.append(random.randint(1,1000))

def getRandomSample(myset, sample_size):
        sorted_list = sorted(random.sample(xrange(len(myset)), sample_size))
        return([myset[i] for i in sorted_list])

def bubbleSort(iterator,alist, return_dictionary):

    sample_list = (getRandomSample(alist, 100))

    for passnum in range(len(sample_list)-1,0,-1):
        for i in range(passnum):
            if sample_list[i]>alist[i+1]:
                temp = alist[i]
                sample_list[i] = alist[i+1]
                sample_list[i+1] = temp
    return_dictionary[iterator] = sample_list    

if __name__ == '__main__':
    manager = Manager()
    return_dictionary = manager.dict()
    jobs = []
    for i in range(3000):
        p = Process(target=bubbleSort, args=(i,myArray,return_dictionary))
        jobs.append(p)
        p.start()

    for proc in jobs:
        proc.join()
    print return_dictionary.values()

The other way:

import os
import random

myArray = []

for i in range(1000):
    myArray.append(random.randint(1,1000))

def getRandomSample(myset, sample_size):
        sorted_list = sorted(random.sample(xrange(len(myset)), sample_size))
        return([myset[i] for i in sorted_list])


def bubbleSort(alist):

    sample_list = (getRandomSample(alist, 100))

    for passnum in range(len(sample_list)-1,0,-1):
        for i in range(passnum):
            if sample_list[i]>alist[i+1]:
                temp = alist[i]
                sample_list[i] = alist[i+1]
                sample_list[i+1] = temp
    return(sample_list)

if __name__ == '__main__':
    results = []
    for i in range(3000):
        results.append(bubbleSort(myArray))
    print results

niemmi · Accepted Answer

Multiprocessing is faster if you have multiple cores and do the parallelization properly. In your example you create 3000 processes which causes enormous amount on context switching between them. Instead of that use Pool to schedule the jobs for processes:

def bubbleSort(alist):

    sample_list = (getRandomSample(alist, 100))

    for passnum in range(len(sample_list)-1,0,-1):
        for i in range(passnum):
            if sample_list[i]>alist[i+1]:
                temp = alist[i]
                sample_list[i] = alist[i+1]
                sample_list[i+1] = temp
    return(sample_list)

if __name__ == '__main__':
    pool = Pool(processes=4)
    for x in pool.imap_unordered(bubbleSort, (myArray for x in range(3000))):
        pass

I removed all the output and did some tests on my 4 core machine. As expected the code above was about 4 times faster than your sequential example.

Multiprocessing in Python not faster than doing it sequentially

Answers (2)

Related Questions