ADJ
ADJ

Reputation: 5282

Python multiprocessing doesn't seem to use more than one core

I want to use Python multiprocessing to run grid search for a predictive model. When I look at core usage, it always seem to be using only one core. Any idea what I'm doing wrong?

import multiprocessing
from sklearn import svm
import itertools

#first read some data
#X will be my feature Numpy 2D array
#y will be my 1D Numpy array of labels

#define the grid        
C = [0.1, 1]
gamma = [0.0]
params = [C, gamma]
grid = list(itertools.product(*params))
GRID_hx = []

def worker(par, grid_list):
    #define a sklearn model
    clf = svm.SVC(C=g[0], gamma=g[1],probability=True,random_state=SEED)
    #run a cross validation function: returns error
    ll = my_cross_validation_function(X, y, model=clf, n=1, test_size=0.2)
    print(par, ll)
    grid_list.append((par, ll))


if __name__ == '__main__':
   manager = multiprocessing.Manager()
   GRID_hx = manager.list()
   jobs = []
   for g in grid:
      p = multiprocessing.Process(target=worker, args=(g,GRID_hx))
      jobs.append(p)
      p.start()
      p.join()

   print("\n-------------------")
   print("SORTED LIST")
   print("-------------------")
   L = sorted(GRID_hx, key=itemgetter(1))
   for l in L[:5]:
      print l

Upvotes: 35

Views: 5926

Answers (3)

Calvin1602
Calvin1602

Reputation: 9547

I'd say :

for g in grid:
    g.p = multiprocessing.Process(target=worker, args=(g,GRID_hx))
    jobs.append(g.p)
    g.p.start()
for g in grid:
    g.p.join()

Currently you're spawning a job, then waithing for it to be done, then going to the next one.

Upvotes: 5

helmbert
helmbert

Reputation: 37944

Your problem is that you join each job immediately after you started it:

for g in grid:
    p = multiprocessing.Process(target=worker, args=(g,GRID_hx))
    jobs.append(p)
    p.start()
    p.join()

join blocks until the respective process has finished working. This means that your code starts only one process at once, waits until it is finished and then starts the next one.

In order for all processes to run in parallel, you need to first start them all and then join them all:

jobs = []
for g in grid:
    p = multiprocessing.Process(target=worker, args=(g,GRID_hx))
    jobs.append(p)
    p.start()

for j in jobs:
    j.join()

Documentation: link

Upvotes: 50

Robin Nabel
Robin Nabel

Reputation: 2310

According to the documentation the join() command locks the current thread until the specified thread returns. So you are basically starting each thread in the for loop and then wait for it to finish, BEFORE you proceed to the next iteration.

I would suggest moving the joins outside the loop!

Upvotes: 6

Related Questions