MPa
MPa

Reputation: 1142

Python multiprocessing: Simple job split across many processes

EDIT

The proposed code actually worked! I was simply running it from within an IDE that wasn't showing the outputs.

I'm leaving the question up because the comments/answers are instructive


I need to split a big job across many workers. In trying to figure out how to do this, I used the following simple example, with code mostly taken from here. Basically, I am taking a list, breaking it up in shorter sublists (chunks), and asking multiprocessing to print the content of each sublist with a dedicated worker:

import multiprocessing
from math import ceil

# Breaking up the long list in chunks:
def chunks(l, n):
    return [l[i:i+n] for i in range(0, len(l), n)]

# Some simple function 
  def do_job(job_id, data_slice):
      for item in data_slice:
          print("{}_{}".format(job_id, item))

I then do this:

if __name__ == '__main__':

    # My "long" list
    l = [letter for letter in 'abcdefghijklmnopqrstuvwxyz']

    my_chunks = chunks(l, ceil(len(l)/4))

At this point, my_chunks is as expected:

[['a', 'b', 'c', 'd', 'e', 'f', 'g'],
 ['h', 'i', 'j', 'k', 'l', 'm', 'n'],
 ['o', 'p', 'q', 'r', 's', 't', 'u'],
 ['v', 'w', 'x', 'y', 'z']]

Then:

    jobs = []
    for i, s in enumerate(my_chunks):
        j = mp.Process(target=do_job, args=(i, s))
        jobs.append(j)
    for j in jobs:
        print('starting job {}'.format(str(j)))        
        j.start()

Initially, I wrote the question because I was not getting the expected printouts from the do_jobfunction.

Turns out the code works just fine when run from command line.

Upvotes: 0

Views: 2752

Answers (1)

Simon
Simon

Reputation: 424

Maybe it's your first time with multiprocessing? Do you wait for the processes to exit or do you exit the main processes before your processes have time to complete there job?

from multiprocessing import Process
from string import ascii_letters
from time import sleep


def job(chunk):
    done = chunk[::-1]
    print(done)

def chunk(data, parts):
    divided = [None]*parts
    n = len(data) // parts
    for i in range(parts):
        divided[i] = data[i*n:n*(i+1)]
    if len(data) % 2 != 0:
        divided[-1] += [data[-1]]
    return divided


def main():
    data = list(ascii_letters)
    workers = 4
    data_chunks = chunk(data, workers)
    ps = []
    for i in range(4):
        w = Process(target=job, args=(data_chunks[i],))
        w.deamon = True
        w.start()
        ps += [w]
    sleep(2)



if __name__ == '__main__':
    main()

Upvotes: 2

Related Questions