Rafael S. Calsaverini
Rafael S. Calsaverini

Reputation: 14022

Poor division of labor using multiprocessing in python?

Consider the following program:

import Queue
from multiprocessing import Queue as Q
from multiprocessing import Process, Value, Array
from time import time
from numpy import sum, zeros
from numpy.random import normal

def enche(queue, cnts, tims, t0, proc):
    while True:
        try:
            queue.get(False)
            x = sum(normal(size = 1000))
            cnts[proc] = cnts[proc] + 1
        except Queue.Empty:
            tims[proc] = time() - t0.value
            break

def main(m, nth):
    queue = Q()
    tmp = [queue.put(0) for i in xrange(m)]
    cnts = Array('i', zeros(nth, dtype = "int"))
    tims = Array('d', zeros(nth))
    t0  = Value('d', time())
    procs = [ Process(target = enche , args = (queue,cnts, tims, t0,i)) for i in xrange(nth) ]
    for p in procs:
        p.start()
    for p in procs:
        p.join()
    for k in xrange(nth):
        print "%2.0f/100, %4.1f ms,"%(cnts[k]/float(m) * 100, tims[k]*1000), 
    print "" 


if __name__ == "__main__":
    for k in xrange(100):
        main(100, 4)

In essence, the function main(m,nth) should do a task - sum a thousand normal random numbers m times - and divide this task in nth processes.

I then count and print what percentage of the task was done by each process and how long each process lasted. This the typical results I get:

 4/100,  1.6 ms,  3/100,  2.2 ms, 88/100, 10.9 ms,  5/100,  2.7 ms, 
23/100,  3.6 ms,  1/100,  1.5 ms,  3/100,  2.0 ms, 73/100, 14.9 ms, 
 8/100,  3.9 ms,  1/100,  3.4 ms,  8/100,  5.3 ms, 83/100, 21.0 ms, 
 6/100,  4.1 ms,  0/100,  3.2 ms, 76/100, 19.0 ms, 18/100,  8.8 ms, 
 5/100,  1.7 ms, 90/100,  9.2 ms,  5/100,  2.4 ms,  0/100,  2.6 ms, 
 2/100,  1.5 ms,  8/100,  2.4 ms, 89/100,  8.9 ms,  1/100,  2.6 ms, 
 8/100,  2.3 ms, 11/100,  2.7 ms,  3/100,  2.4 ms, 78/100,  8.5 ms, 
52/100,  5.8 ms, 11/100,  3.2 ms,  4/100,  2.6 ms, 33/100,  5.9 ms, 
 6/100,  2.3 ms,  8/100,  2.9 ms, 24/100,  4.3 ms, 62/100,  7.7 ms, 
13/100,  3.0 ms, 11/100,  2.6 ms,  1/100,  2.5 ms, 75/100,  8.5 ms, 
 6/100,  2.1 ms,  3/100,  2.1 ms,  6/100,  2.7 ms, 85/100,  9.0 ms, 
 8/100,  2.3 ms,  3/100,  2.0 ms, 27/100,  4.4 ms, 62/100,  7.6 ms, 
18/100,  3.2 ms,  0/100,  1.9 ms,  6/100,  3.1 ms, 76/100,  9.2 ms, 
 3/100,  1.9 ms, 19/100,  3.4 ms, 30/100,  6.0 ms, 48/100,  7.9 ms, 
10/100,  2.4 ms,  0/100,  1.5 ms,  8/100,  2.8 ms, 82/100,  8.8 ms, 
 6/100,  1.8 ms, 36/100,  4.7 ms, 32/100,  4.7 ms, 26/100,  4.7 ms, 
 8/100,  2.0 ms, 46/100,  5.9 ms, 46/100,  5.9 ms,  0/100,  2.6 ms, 
 3/100,  1.6 ms, 54/100,  6.2 ms,  1/100,  2.4 ms, 42/100,  6.1 ms, 
12/100,  2.6 ms,  8/100,  2.4 ms, 34/100,  5.3 ms, 46/100,  6.3 ms, 
 1/100,  1.4 ms,  9/100,  2.3 ms,  1/100,  2.3 ms, 89/100,  9.2 ms, 
13/100,  2.6 ms,  5/100,  2.2 ms,  3/100,  2.4 ms, 79/100,  8.7 ms, 
11/100,  2.3 ms,  7/100,  2.6 ms,  8/100,  2.9 ms, 74/100,  8.8 ms, 
 8/100,  1.8 ms,  9/100,  2.7 ms,  0/100,  1.8 ms, 83/100,  9.3 ms, 
13/100,  2.4 ms,  4/100,  2.0 ms,  3/100,  2.5 ms, 80/100,  9.9 ms, 
10/100,  2.3 ms, 72/100,  8.0 ms,  6/100,  2.6 ms, 12/100,  4.1 ms, 
27/100,  4.3 ms,  0/100,  1.5 ms,  5/100,  2.5 ms, 68/100,  8.4 ms, 
 3/100,  1.5 ms,  8/100,  2.3 ms, 86/100,  8.3 ms,  3/100,  2.7 ms, 
 7/100,  1.9 ms, 10/100,  2.5 ms,  5/100,  2.5 ms, 78/100,  8.2 ms, 
14/100,  2.5 ms,  3/100,  2.0 ms,  5/100,  2.5 ms, 78/100,  8.2 ms, 
53/100,  7.4 ms,  3/100,  1.9 ms,  1/100,  2.4 ms, 43/100,  7.5 ms, 
 3/100,  1.7 ms,  9/100,  2.5 ms, 47/100,  6.0 ms, 41/100,  6.0 ms, 
 3/100,  1.5 ms, 37/100,  4.7 ms, 33/100,  4.7 ms, 27/100,  4.7 ms, 
 7/100,  1.8 ms, 36/100,  4.5 ms, 32/100,  4.5 ms, 25/100,  4.5 ms, 
 3/100,  1.4 ms, 51/100,  5.6 ms, 46/100,  5.6 ms,  0/100,  2.3 ms, 
 5/100,  1.8 ms,  9/100,  2.5 ms, 46/100,  5.8 ms, 40/100,  5.9 ms, 
 9/100,  2.0 ms,  3/100,  1.8 ms,  5/100,  2.3 ms, 83/100,  8.8 ms, 
 7/100,  2.1 ms,  3/100,  2.1 ms,  8/100,  2.9 ms, 82/100,  8.6 ms, 
11/100,  2.3 ms,  4/100,  2.0 ms,  3/100,  2.5 ms, 82/100,  9.3 ms, 
19/100,  2.9 ms,  9/100,  2.6 ms, 10/100,  3.0 ms, 62/100,  7.3 ms, 
18/100,  2.8 ms, 16/100,  2.9 ms,  2/100,  2.5 ms, 64/100,  7.9 ms, 
 2/100,  1.5 ms, 51/100,  5.8 ms, 47/100,  5.9 ms,  0/100,  2.4 ms, 
26/100,  4.0 ms,  8/100,  2.3 ms, 11/100,  2.9 ms, 55/100,  7.6 ms, 
 8/100,  2.1 ms,  3/100,  1.9 ms, 79/100,  9.2 ms, 10/100,  3.8 ms, 
14/100,  2.5 ms,  4/100,  2.1 ms, 45/100,  5.6 ms, 37/100,  5.6 ms, 
11/100,  2.2 ms,  9/100,  2.8 ms,  4/100,  2.4 ms, 76/100,  8.9 ms, 
 3/100,  1.4 ms,  4/100,  2.1 ms,  5/100,  2.4 ms, 88/100,  9.5 ms, 
 3/100,  1.4 ms,  7/100,  2.4 ms, 49/100,  5.9 ms, 41/100,  5.9 ms, 
 3/100,  1.4 ms,  3/100,  1.8 ms, 92/100,  9.3 ms,  2/100,  2.6 ms, 
 9/100,  2.3 ms, 16/100,  3.1 ms, 41/100,  5.5 ms, 34/100,  5.5 ms, 
 5/100,  1.7 ms,  8/100,  2.3 ms, 47/100,  5.8 ms, 40/100,  5.7 ms, 
 6/100,  1.7 ms,  5/100,  2.2 ms,  5/100,  2.4 ms, 84/100,  8.6 ms, 
 7/100,  2.0 ms,  6/100,  2.2 ms, 47/100,  5.9 ms, 40/100,  5.8 ms, 
 5/100,  1.9 ms, 32/100,  5.1 ms,  6/100,  2.7 ms, 57/100,  7.7 ms, 
11/100,  2.4 ms, 14/100,  2.7 ms, 42/100,  5.3 ms, 33/100,  5.3 ms, 
50/100,  6.6 ms, 12/100,  2.8 ms,  0/100,  1.9 ms, 38/100,  6.6 ms, 
55/100,  7.0 ms,  1/100,  1.7 ms,  2/100,  2.2 ms, 42/100,  6.9 ms, 
 8/100,  1.9 ms, 48/100,  6.7 ms,  5/100,  2.4 ms, 39/100,  6.7 ms, 
12/100,  2.3 ms, 18/100,  3.0 ms, 30/100,  5.6 ms, 40/100,  6.8 ms, 
 5/100,  1.7 ms,  3/100,  1.9 ms,  5/100,  2.5 ms, 87/100,  8.8 ms, 
50/100,  6.7 ms, 13/100,  2.7 ms,  0/100,  2.0 ms, 37/100,  6.7 ms, 

Why do I get such a poor division of labor? Why sometimes a process last for short times and do nothing and others do almost 90% of the job?

What should I do to improve this?

edit: extra info - I run this in a i7 CPU.

Upvotes: 2

Views: 258

Answers (2)

JC Plessis
JC Plessis

Reputation: 693

As Maxim mentionned, your tasks are too small for the OS to swap each process in turn and adjust allocated CPU time for each process. The longer the task, the more each process will have access to the processor.

You might also check the real CPU time that has been allocated to current process by using os.times()

It allows you to know how many real CPU time has been allocated to each process. Of course the process can't do any work if it is not allocated any CPU time :)

Upvotes: 1

Maxim Egorushkin
Maxim Egorushkin

Reputation: 136306

The task it does in parallel is rather small compared to process start overhead, so that one process can do many iterations before other processes get a chance to start running.

If the line:

x = sum(normal(size = 1000))

is changed to:

x = sum(normal(size = 100000))

This is what I get:

25/100, 366.2 ms, 25/100, 364.4 ms, 25/100, 364.1 ms, 25/100, 361.7 ms, 
34/100, 362.1 ms,  0/100,  2.6 ms, 25/100, 357.6 ms, 41/100, 357.8 ms, 
25/100, 365.2 ms, 26/100, 362.7 ms, 25/100, 366.0 ms, 24/100, 355.7 ms, 
21/100, 360.3 ms, 21/100, 355.9 ms, 22/100, 362.3 ms, 36/100, 358.0 ms, 

Upvotes: 5

Related Questions