杨梓东
杨梓东

Reputation: 25

Poor Python Multiprocessing Performance

I attempted to speed up my python program using the multiprocessing module but I found it was quite slow. A Toy example is as follows:

import time
from multiprocessing import Pool, Manager

class A:
    def __init__(self, i):
        self.i = i

    def score(self, x):
        return self.i - x


class B:
    def __init__(self):
        self.i_list = list(range(1000))
        self.A_list = []

    def run_1(self):
        for i in self.i_list:
            self.x = i
            map(self.compute, self.A_list) #map version
            self.A_list.append(A(i))

    def run_2(self):
        p = Pool()
        for i in self.i_list:
            self.x = i
            p.map(self.compute, self.A_list) #multicore version
            self.A_list.append(A(i))

    def compute(self, some_A):
        return some_A.score(self.x)

if __name__ == "__main__":
    st = time.time()
    foo = B()
    foo.run_1()
    print("Map: ", time.time()-st)
    st = time.time()
    foo = B()
    foo.run_2()
    print("MultiCore: ", time.time()-st) 

The outcomes on my computer(Windows 10, Python 3.5) is

Map: 0.0009996891021728516

MultiCore: 19.34994912147522

Similar results can be observed on Linux Machine (CentOS 7, Python 3.6).

I guess it was caused by the pickling/depicking of objects among processes? I tried to use the Manager module but failed to get it to work.

Any help will be appreciated.

Upvotes: 1

Views: 1216

Answers (2)

Michael Smith
Michael Smith

Reputation: 75

The reasoning behind this is:

  1. the map call is performed by main

*meaning when foo.run_1() is called. The main is mapping for itself. much like telling your self what to do.

*when foo_run2() is called the main is mapping for max process capablilites of that pc. If your max process is 6, then the main is mapping for 6 Threads. much like orginizing 6 people to tell you something.

Side Note: if you use:

p.imap(self.compute,self.A_list)

the items will append in order to A_list

Upvotes: 0

Pixou
Pixou

Reputation: 1769

Wow that's impressive (and slow!).

Yes, this is because Objects must be accessed concurrently by workers, which is costly.

So I played a little bit and managed to gain a lot of perf by making the compute method static. So basically, you don't need to share the B object instance anymore. Still very slow but better.

import time
from multiprocessing import Pool, Manager

class A:
    def __init__(self, i):
        self.i = i

    def score(self, x):
        return self.i - x

x=0
def static_compute(some_A):
    res= some_A.score(x)
    return res


class B:
    def __init__(self):
        self.i_list = list(range(1000))
        self.A_list = []

    def run_1(self):
        for i in self.i_list:
            x=i
            map(self.compute, self.A_list) #map version
            self.A_list.append(A(i))

    def run_2(self):
        p = Pool(4)
        for i in self.i_list:
            x=i
            p.map(static_compute, self.A_list) #multicore version
            self.A_list.append(A(i))

The other reason that makes it slow, to me, is the fixed cost of using Pool. You're actually launching a Pool.map 1000 times. If there is a fixed cost associated with launching those processes, that would make the overall strategy slow. Maybe you should test that with longer A_list (longer than the i_list, which requires a different algo).

Upvotes: 1

Related Questions