Reputation: 8314
I have a file that I want to process in Python. Each line in this file is a path to an image, and I would like to call a feature extraction algorithm on each image.
I would like to divide the file into smaller chunks and each chunk will be processed in a parallel separate process. What are the good state-of-the-art libraries or solutions for for this kind of multiprocessing in Python?
Upvotes: 1
Views: 1682
Reputation: 414079
Your description suggests that a simple thread (or process) pool would work:
#!/usr/bin/env python
from multiprocessing.dummy import Pool # thread pool
from tqdm import tqdm # $ pip install tqdm # simple progress report
def mp_process_image(filename):
try:
return filename, process_image(filename), None
except Exception as e:
return filename, None, str(e)
def main():
# consider every non-blank line in the input file to be an image path
image_paths = (line.strip()
for line in open('image_paths.txt') if line.strip())
pool = Pool() # number of threads equal to number of CPUs
it = pool.imap_unordered(mp_process_image, image_paths, chunksize=100)
for filename, result, error in tqdm(it):
if error is not None:
print(filename, error)
if __name__=="__main__":
main()
I assume that process_image()
is CPU-bound and it releases GIL i.e., it does the main job in a C extension such OpenCV. If process_image()
doesn't release GIL then remove the word .dummy
from the Pool
import to use processes instead of threads.
Upvotes: 4