scrx2
scrx2

Reputation: 2282

testing python multiprocessing: low speed because of overhead?

I'm trying to learn about multiprocessing in python (2.7). My CPU has 4 cores. In the following code I test speed of parallel Vs serial execution of the same basic instruction.

I find that the time taken using the 4 cores is only 0.67 the one taken by only one core, while naively I'd expect ~0.25.

Is overhead the reason? where does it come from? Are not the 4 processes independent?

I also tried pool.map and pool.map_async, with very similar results in terms of speed.

from multiprocessing import Process
import time

def my_process(a):
    for i in range(0,a[1]):
        j=0
        while j<10000:
            j = j+1
    print(a,j)

if __name__ == '__main__':
    # arguments to pass:
    a = ((0,2000),(1,2000),(2,2000),(3,2000))

    # --- 1) parallel processes:
    # 4 cores go up to 100% each here
    t0 = time.time()
    proc1 = Process(target=my_process, args=(a[0],))
    proc2 = Process(target=my_process, args=(a[1],))
    proc3 = Process(target=my_process, args=(a[2],))
    proc4 = Process(target=my_process, args=(a[3],))
    proc1.start(); proc2.start(); proc3.start(); proc4.start()
    proc1.join() ; proc2.join() ; proc3.join() ; proc4.join()
    dt_parallel = time.time()-t0
    print("parallel : " + str(dt_parallel))

    # --- 2) serial process :
    # 1 core only goes up to 100%
    t0 = time.time()
    for k in a:
        my_process(k)
    dt_serial = time.time()-t0
    print("serial : " + str(dt_serial))

    print("t_par / t_ser = " + str(dt_parallel/dt_serial))

EDIT my PC has actually 2 physical cores (2 = 2 cores per socket * 1 sockets, from lscpu [thanks @goncalopp]). If I run the above script with only the first 2 processes I get a ratio of 0.62, not that different to the one obtained with 3 or 4 processes. I guess it won't be easy to go faster than that.

I tested on another PC with lscpu: CPU(s):32, Thread(s) per core: 2, core(s) per socket: 8, Socket(s): 2, and I get a ratio of 0.34, similar to @dano.

Thanks for your help

Upvotes: 4

Views: 2508

Answers (1)

loopbackbee
loopbackbee

Reputation: 23322

Yes, this may be related to overhead, including:

  • Creating and starting the processes
  • passing the function and the arguments over to them
  • waiting for process termination

If you truly have 4 physical cores on your machine (and not 2 cores with hyperthreading or similar), you should see that the ratio becomes closer to what is expected for larger inputs, as chepner said. If you only have 2 physical cores, you can't get ratio < 0.5

Upvotes: 3

Related Questions