user2552108
user2552108

Reputation: 1150

Python Multiprocessing Not Working on Windows 10

I am currently working on an image processing project, where I have about 50k images and I want to do several simple preprocesses (change size, change image type, etc.) and save the results into a separate local directory

I have been done the preprocessing code using Python, and it is fine. But it is taking too long and I decided to use the multiprocessing module. But I have encountered a strange behavior when running the code.

Please see below

def doPreprocess(f):
    #do the preprocessing steps and save to directory



if __name__ == '__main__':

    path = "C:/My/Image/Folder/"

    files =  fnmatch.filter(os.listdir(path), pat = '*.jpg')

    print (len(files))

    pool = mp.Pool(processes=4)
    pool.map_async(doPreprocess, files)

    print ("temp")

However, when I run the code with Command Prompt, it skips the preprocessing step and prints temp right away, as if the function didn't run at all. The preprocessed images are not saved either.

Can someone help me?

Thank you in advance.

Note: I am using Windows 10 and Python 3.6

enter image description here

Upvotes: 2

Views: 7805

Answers (1)

api55
api55

Reputation: 11420

I tested your code and the thing is that your main thread ends before finishing the other threads... if you try this simplifiied version of your code you will see what I mean:

import multiprocessing as mp


def doPreprocess(f):
    print(f)


if __name__ == '__main__':

    files = ["hello"] * 10
    pool = mp.Pool(processes=4)
    res = pool.map_async(doPreprocess, files)

    print ("temp")

This basically prints temp and exit... but if you add:

res.wait()

just before the print("temp") the main thread will wait until all the other ones are finished and then it prints temp and exit. It will also work if you change:

res = pool.map_async(doPreprocess, files)

to:

res = pool.map(doPreprocess, files)

Why? one is non-blocking (map_async) and ideally you do other things in the main thread, and the other ones blocking (map) which is more or less the equivalent of putting the wait() immediately after pool.map_async.

Upvotes: 2

Related Questions