Reputation: 5282
I want to use Python multiprocessing to run grid search for a predictive model. When I look at core usage, it always seem to be using only one core. Any idea what I'm doing wrong?
import multiprocessing
from sklearn import svm
import itertools
#first read some data
#X will be my feature Numpy 2D array
#y will be my 1D Numpy array of labels
#define the grid
C = [0.1, 1]
gamma = [0.0]
params = [C, gamma]
grid = list(itertools.product(*params))
GRID_hx = []
def worker(par, grid_list):
#define a sklearn model
clf = svm.SVC(C=g[0], gamma=g[1],probability=True,random_state=SEED)
#run a cross validation function: returns error
ll = my_cross_validation_function(X, y, model=clf, n=1, test_size=0.2)
print(par, ll)
grid_list.append((par, ll))
if __name__ == '__main__':
manager = multiprocessing.Manager()
GRID_hx = manager.list()
jobs = []
for g in grid:
p = multiprocessing.Process(target=worker, args=(g,GRID_hx))
jobs.append(p)
p.start()
p.join()
print("\n-------------------")
print("SORTED LIST")
print("-------------------")
L = sorted(GRID_hx, key=itemgetter(1))
for l in L[:5]:
print l
Upvotes: 35
Views: 5926
Reputation: 9547
I'd say :
for g in grid:
g.p = multiprocessing.Process(target=worker, args=(g,GRID_hx))
jobs.append(g.p)
g.p.start()
for g in grid:
g.p.join()
Currently you're spawning a job, then waithing for it to be done, then going to the next one.
Upvotes: 5
Reputation: 37944
Your problem is that you join each job immediately after you started it:
for g in grid:
p = multiprocessing.Process(target=worker, args=(g,GRID_hx))
jobs.append(p)
p.start()
p.join()
join blocks until the respective process has finished working. This means that your code starts only one process at once, waits until it is finished and then starts the next one.
In order for all processes to run in parallel, you need to first start them all and then join them all:
jobs = []
for g in grid:
p = multiprocessing.Process(target=worker, args=(g,GRID_hx))
jobs.append(p)
p.start()
for j in jobs:
j.join()
Documentation: link
Upvotes: 50
Reputation: 2310
According to the documentation the join() command locks the current thread until the specified thread returns. So you are basically starting each thread in the for loop and then wait for it to finish, BEFORE you proceed to the next iteration.
I would suggest moving the joins outside the loop!
Upvotes: 6