Rami
Rami

Reputation: 8314

Multiprocessing in Python, each process handles part of a file

I have a file that I want to process in Python. Each line in this file is a path to an image, and I would like to call a feature extraction algorithm on each image.

I would like to divide the file into smaller chunks and each chunk will be processed in a parallel separate process. What are the good state-of-the-art libraries or solutions for for this kind of multiprocessing in Python?

Upvotes: 1

Views: 1682

Answers (1)

jfs
jfs

Reputation: 414079

Your description suggests that a simple thread (or process) pool would work:

#!/usr/bin/env python
from multiprocessing.dummy import Pool # thread pool
from tqdm import tqdm # $ pip install tqdm # simple progress report

def mp_process_image(filename):
    try:
       return filename, process_image(filename), None
    except Exception as e:
       return filename, None, str(e)

def main():
    # consider every non-blank line in the input file to be an image path
    image_paths = (line.strip()
                   for line in open('image_paths.txt') if line.strip())
    pool = Pool() # number of threads equal to number of CPUs
    it = pool.imap_unordered(mp_process_image, image_paths, chunksize=100)
    for filename, result, error in tqdm(it):
        if error is not None:
           print(filename, error)

if __name__=="__main__":
    main() 

I assume that process_image() is CPU-bound and it releases GIL i.e., it does the main job in a C extension such OpenCV. If process_image() doesn't release GIL then remove the word .dummy from the Pool import to use processes instead of threads.

Upvotes: 4

Related Questions