Hello lad
Hello lad

Reputation: 18790

multiprocessing do not work

I am working on Ubuntu 12 with 8 CPU3 as reported by the System monitor.

the testing code is

    import multiprocessing as mp

    def square(x):
        return x**2


    if __name__ == '__main__':
        pool=mp.Pool(processes=4)
        pool.map(square,range(100000000))
        pool.close()
        # for i in range(100000000):
        #    square(i)

The problem is:

1) All workload seems to be scheduled to just one core, which gets close to 100% utilization, despite the fact that several processes are started. Occasionally all workload migrates to another core but the workload is never distributed among them.

2) without multiprocessing is faster

    for i in range(100000000):
        square(i)

I have read the similar questions on stackoverflow like: Python multiprocessing utilizes only one core

still got no applied result.

Upvotes: 0

Views: 479

Answers (3)

PierreBdR
PierreBdR

Reputation: 43234

The function you are using is way too short (i.e. doesn't take enough time to compute), so you spend all your time in the synchronization between processes, that has to be done in a serial manner (so why not on a single processor). Try this:

import multiprocessing as mp


def square(x):
    for i in range(10000):
         j = i**2
    return x**2


if __name__ == '__main__':
    # pool=mp.Pool(processes=4)
    # pool.map(square,range(1000))
    # pool.close()
    for i in range(1000):
        square(i)

You will see that suddenly the multiprocessing works well: it takes ~2.5 seconds to accomplish, while it will take 10s without it.

Note: If using python 2, you might want to replace all the range by xrange

Edit: I replaced time.sleep by a CPU-intensive but useless calculation

Addendum: In general, for multi-CPU applications, you should try to make each CPU do as much work as possible without returning to the same process. In a case like yours, this means splitting the range into almost-equal sized lists, one per CPU and send them to the various CPUs.

Upvotes: 2

enrico.bacis
enrico.bacis

Reputation: 31474

When you do:

pool.map(square, range(100000000))

Before invoking the map function, it has to create a list with 100000000 elements, and this is done by a single process, That's why you see a single core working.

Use a generator instead, so each core can pop a number out of it and you should see the speedup:

pool.map(square, xrange(100000000))

Upvotes: 1

holdenweb
holdenweb

Reputation: 37003

It isn't sufficient simply to import the multiprocessing library to make use of multiple processes to schedule your work. You actually have to create processes too!

Your work is currently scheduled to a single core because you haven't done so, and so your program is a single process with a single thread.

Naturally, when you start a new process to simply square a number, you are going to get slower performance. The overhead of process creation makes sure of that. So your process pool will very likely take longer than a singe-process run.

Upvotes: 0

Related Questions