TheOnlyOne11
TheOnlyOne11

Reputation: 23

Multiprocessing in Python not faster than doing it sequentially

I want to do something parallelly but it always goes slower. I put an example of two code snippets which can be compared. The multiprocessing way needs 12 seconds on my laptop. The sequential way only 3 seconds. I thought multiprocessing is faster. I know that the task in this way does not make any sense but it is just made to compare the two ways. I know bubble sort can be replaced by faster ways.

Thanks.

Multiprocessing way:

from multiprocessing import Process, Manager
import os
import random

myArray = []

for i in range(1000):
    myArray.append(random.randint(1,1000))

def getRandomSample(myset, sample_size):
        sorted_list = sorted(random.sample(xrange(len(myset)), sample_size))
        return([myset[i] for i in sorted_list])

def bubbleSort(iterator,alist, return_dictionary):

    sample_list = (getRandomSample(alist, 100))

    for passnum in range(len(sample_list)-1,0,-1):
        for i in range(passnum):
            if sample_list[i]>alist[i+1]:
                temp = alist[i]
                sample_list[i] = alist[i+1]
                sample_list[i+1] = temp
    return_dictionary[iterator] = sample_list    

if __name__ == '__main__':
    manager = Manager()
    return_dictionary = manager.dict()
    jobs = []
    for i in range(3000):
        p = Process(target=bubbleSort, args=(i,myArray,return_dictionary))
        jobs.append(p)
        p.start()

    for proc in jobs:
        proc.join()
    print return_dictionary.values()

The other way:

import os
import random

myArray = []

for i in range(1000):
    myArray.append(random.randint(1,1000))

def getRandomSample(myset, sample_size):
        sorted_list = sorted(random.sample(xrange(len(myset)), sample_size))
        return([myset[i] for i in sorted_list])


def bubbleSort(alist):

    sample_list = (getRandomSample(alist, 100))

    for passnum in range(len(sample_list)-1,0,-1):
        for i in range(passnum):
            if sample_list[i]>alist[i+1]:
                temp = alist[i]
                sample_list[i] = alist[i+1]
                sample_list[i+1] = temp
    return(sample_list)

if __name__ == '__main__':
    results = []
    for i in range(3000):
        results.append(bubbleSort(myArray))
    print results

Upvotes: 2

Views: 1434

Answers (2)

niemmi
niemmi

Reputation: 17263

Multiprocessing is faster if you have multiple cores and do the parallelization properly. In your example you create 3000 processes which causes enormous amount on context switching between them. Instead of that use Pool to schedule the jobs for processes:

def bubbleSort(alist):

    sample_list = (getRandomSample(alist, 100))

    for passnum in range(len(sample_list)-1,0,-1):
        for i in range(passnum):
            if sample_list[i]>alist[i+1]:
                temp = alist[i]
                sample_list[i] = alist[i+1]
                sample_list[i+1] = temp
    return(sample_list)

if __name__ == '__main__':
    pool = Pool(processes=4)
    for x in pool.imap_unordered(bubbleSort, (myArray for x in range(3000))):
        pass

I removed all the output and did some tests on my 4 core machine. As expected the code above was about 4 times faster than your sequential example.

Upvotes: 1

Matthias Schreiber
Matthias Schreiber

Reputation: 2517

Multiprocessing is not just magically faster. The thing is that your computer still has to do the same amount of work. It's like if you try to do multiple tasks at once, it's not going to be faster.

In a "normal" program, doing it sequential is easier to read and write (that it is that much faster too surprises me a little). Multiprocessing is especially useful if you have to wait for another process like a web request (you can send multiple at once and don't have to wait for each) or having some sort of event loop. My guess as to why it is faster is that Python already uses multiprocessing internally wherever it makes sense (don't quote me on that). Also with threading it has to keep track of what is where, which means more overhead.

So, if we go back to the example in the real world, if you give a task to somebody else and instead of waiting for it, you do other things at the same time as them, then you are faster.

Upvotes: 0

Related Questions