staticfloat
staticfloat

Reputation: 7040

multiprocessing.Pool processes locked to a single core

I'm using multiprocessing.Pool in Python on Ubuntu 12.04, and I'm running into a curious problem; When I call map_async on my Pool, I spawn 8 processes, but they all struggle for dominance over a single core of my 8-core machine. The exact same code uses up both of my cores in my Macbook Pro, and all four cores of my other Ubuntu 12.04 desktop (as measured with htop, in all cases).

My code is too long to post all of, but the important part is:

P = multiprocessing.Pool()
results = P.map_async( unwrap_self_calc_timepoint, zip([self]*self.xLen,xrange(self.xLen)) ).get(99999999999)
P.close()
P.join()
ipdb.set_trace()

where unwrap_self_calc_timepoint is a wrapper function to pass the necessary self argument to a class, based on the advice of this article.

All three computers are using Python 2.7.3, and I don't really know where to start in hunting down why that one Ubuntu computer is acting up. Any help as to how to begin narrowing the problem down would be helpful. Thank you!

Upvotes: 5

Views: 4672

Answers (2)

Russ
Russ

Reputation: 3771

This seems to be a fairly common issue between numpy and certain Linux distributions. I haven't had any luck using taskset near the start of the program, but it does do the trick when used in the code to be parallelized:

import multiprocessing as mp
import numpy as np
import os

def something():
    os.system("taskset -p 0xfffff %d" % os.getpid())
    X = np.random.randn(5000,2000)
    Y = np.random.randn(2000,5000)
    Z = np.dot(X,Y)
    return Z.mean()

pool = mp.Pool(processes=10)
out = pool.map(something, np.arange(20))
pool.close()
pool.join()

Upvotes: 3

user2660966
user2660966

Reputation: 975

I had the same problem, in my case the solution was to tell linux to work on the whole processors instead on only one : try adding the 2 following lines at the beginning of your code :

import os os.system("taskset -p 0xfffff %d" % os.getpid())

Upvotes: 2

Related Questions