Reputation: 2282
I'm trying to learn about multiprocessing
in python (2.7). My CPU has 4 cores. In the following code I test speed of parallel Vs serial execution of the same basic instruction.
I find that the time taken using the 4 cores is only 0.67 the one taken by only one core, while naively I'd expect ~0.25.
Is overhead the reason? where does it come from? Are not the 4 processes independent?
I also tried pool.map
and pool.map_async
, with very similar results in terms of speed.
from multiprocessing import Process
import time
def my_process(a):
for i in range(0,a[1]):
j=0
while j<10000:
j = j+1
print(a,j)
if __name__ == '__main__':
# arguments to pass:
a = ((0,2000),(1,2000),(2,2000),(3,2000))
# --- 1) parallel processes:
# 4 cores go up to 100% each here
t0 = time.time()
proc1 = Process(target=my_process, args=(a[0],))
proc2 = Process(target=my_process, args=(a[1],))
proc3 = Process(target=my_process, args=(a[2],))
proc4 = Process(target=my_process, args=(a[3],))
proc1.start(); proc2.start(); proc3.start(); proc4.start()
proc1.join() ; proc2.join() ; proc3.join() ; proc4.join()
dt_parallel = time.time()-t0
print("parallel : " + str(dt_parallel))
# --- 2) serial process :
# 1 core only goes up to 100%
t0 = time.time()
for k in a:
my_process(k)
dt_serial = time.time()-t0
print("serial : " + str(dt_serial))
print("t_par / t_ser = " + str(dt_parallel/dt_serial))
EDIT my PC has actually 2 physical cores (2 = 2 cores per socket * 1 sockets, from lscpu
[thanks @goncalopp]). If I run the above script with only the first 2 processes I get a ratio of 0.62, not that different to the one obtained with 3 or 4 processes. I guess it won't be easy to go faster than that.
I tested on another PC with lscpu
: CPU(s):32, Thread(s) per core: 2, core(s) per socket: 8, Socket(s): 2, and I get a ratio of 0.34, similar to @dano.
Thanks for your help
Upvotes: 4
Views: 2508
Reputation: 23322
Yes, this may be related to overhead, including:
If you truly have 4 physical cores on your machine (and not 2 cores with hyperthreading or similar), you should see that the ratio becomes closer to what is expected for larger inputs, as chepner said. If you only have 2 physical cores, you can't get ratio < 0.5
Upvotes: 3