user8502432
user8502432

Reputation:

Pythons parallel processing

I am in the following setting: I have a method that takes an objective function f as input. As a subrouting of that method i want to evaluate f on a small set of points. Since f has high complexity i considered doing that in parallel. All online examples hang up even for trivial functions like squaring on sets with 5 points. They are using the multiprocessing library - and i don't know what i am doing wrong. I am not sure how to encapsulate that __name__ == "__main__" statement in my method. (since it is part of a module - i guess instead of "__main__" i should use the module name?)

Code i have been using looks like

from multiprocessing.pool import Pool
from multiprocessing import cpu_count

x = [1,2,3,4,5]
num_cores = cpu_count()
def f(x):
    return x**2

if __name__ == "__main__":
    pool = Pool(num_cores)
    y = list(pool.map(f, x))
    pool.join()
    print(y)

When executing this code in my spyder it takes a bloody long time to finish.

So my main questions are: What am i doing wrong in this code? How can i encapsulate the __name__-statement, when this code is part of a bigger method? Is it even worth it parallelizing this? (one function evaluation can take multiple minutes and in serial this adds up to a total runtime of hours...)

Upvotes: 0

Views: 55

Answers (2)

Corentin Limier
Corentin Limier

Reputation: 5006

According to documentation :

close()

Prevents any more tasks from being submitted to the pool. Once all the tasks have been completed the worker processes will exit.

terminate()

Stops the worker processes immediately without completing outstanding work. When the pool object is garbage collected

terminate() will be called immediately.

join()

Wait for the worker processes to exit. One must call close() or terminate() before using join().

So you should add :

from multiprocessing.pool import Pool
from multiprocessing import cpu_count

x = [1,2,3,4,5]

def f(x):
    return x**2

if __name__ == "__main__":
    pool = Pool()
    y = list(pool.map(f, x))
    pool.close()
    pool.join()
    print(y)

You can call Pool without any argument and it will use cpu_count by default

If processes is None then the number returned by cpu_count() is used

About the if name == "main", read more informations here.

So you need to think a bit about which code you want executed only in the main program. The most obvious example is that you want code that creates child processes to run only in the main program - so that should be protected by name == 'main'

Upvotes: 1

arjoonn
arjoonn

Reputation: 988

You might want to look into the chunksize argument of the map function that you are using.

On a large enough input list, a lot of your time is spent simply communicating the arguments to and from the separate parallel processes.

One symptom of this problem is that when you use something like htop all cores are firing but at < 100%.

Upvotes: 0

Related Questions