DocDriven
DocDriven

Reputation: 3974

Timing a multiprocessing script

I've stumbled across a weird timing issue while using the multiprocessing module.

Consider the following scenario. I have functions like this:

import multiprocessing as mp

def workerfunc(x):
    # timehook 3
    # something with x
    # timehook 4

def outer():

    # do something

    mygen = ... (some generator expression)

    pool = mp.Pool(processes=8)

    # time hook 1
    result = [pool.apply(workerfunc, args=(x,)) for x in mygen]
    # time hook 2

if __name__ == '__main__':
    outer()

I am utilizing the time module to get an arbitrary feeling for how long my functions run. I successfully create 8 separate processes, which terminate without error. The longest time for a worker to finish is about 130 ms (measured between timehook 3 and 4).

I expected (as they are running in parallel) that the time between hook 1 and 2 will be approximately the same. Surprisingly, I get 600 ms as a result.

My machine has 32 cores and should be able to handle this easily. Can anybody give me a hint where this difference in time comes from?

Thanks!

Upvotes: 0

Views: 926

Answers (2)

Pitto
Pitto

Reputation: 8579

Since you are using multiprocessing and not multithreading your performance issue is not related to GIL (Python's Global Interpreter Lock).

I've found an interesting link explaining this with an example, you can find it in the bottom of this answer.

The GIL does not prevent a process from running on a different processor of a machine. It simply only allows one thread to run at once within the interpreter.

So multiprocessing not multithreading will allow you to achieve true concurrency.

Lets understand this all through some benchmarking because only that will lead you to believe what is said above. And yes, that should be the way to learn — experience it rather than just read it or understand it. Because if you experienced something, no amount of argument can convince you for the opposing thoughts.

import random
from threading import Thread
from multiprocessing import Process
size = 10000000   # Number of random numbers to add to list
threads = 2 # Number of threads to create
my_list = []
for i in xrange(0,threads):
    my_list.append([])
def func(count, mylist):
    for i in range(count):
        mylist.append(random.random())
def multithreaded():
    jobs = []
    for i in xrange(0, threads):
        thread = Thread(target=func,args=(size,my_list[i]))
        jobs.append(thread)
    # Start the threads
    for j in jobs:
        j.start() 
    # Ensure all of the threads have finished
    for j in jobs:
        j.join()

def simple():
    for i in xrange(0, threads):
        func(size,my_list[i])

def multiprocessed():
    processes = []
    for i in xrange(0, threads):
        p = Process(target=func,args=(size,my_list[i]))
        processes.append(p)
    # Start the processes
    for p in processes:
        p.start()
    # Ensure all processes have finished execution
    for p in processes:
        p.join()
if __name__ == "__main__":
    multithreaded()
    #simple()
    #multiprocessed()

Additional information

Here you can find the source of this information and a more detailed technical explanation (bonus: there's also Guido Van Rossum quotes in it :) )

Upvotes: 1

Vince W.
Vince W.

Reputation: 3785

You are using pool.apply which is blocking. Use pool.apply_async instead and then the function calls will all run in parallel, and each will return an AsyncResult object immediately. You can use this object to check when the processes are done and then retrieve the results using this object also.

Upvotes: 1

Related Questions