Use Multiprocessing for storing large images

Question

i write a application/Gui in PyQt5 and would like store large Images (>5000 RGB Images)

Now, I have a function, which store every Picture with cv2.imwrite, but this process takes a lot of time. So i read here in Stackoverflow, that I can do this with multiprocessing. But I´m very new in python.

My Multiprocessing Code:

def SaveImages(self):
    jobs = []
    for i in range(5):
        p = multiprocessing.Process(target = self.SaveAllImages, args=self)
        jobs.append(p)
        p.start()

In Function SaveAllImages are the basic Code for storing Images per Frame. If I run this Code - there are a Error:

p = multiprocessing.Process(target = SaveAllImages, args=self)
NameError: name 'SaveAllImages' is not defined

But SaveAllImages are definied: def SaveAllImages(self)

So my Question are:

Why i became this error
How can I implement a very simply multiprocessing for storing Images

Adam · Accepted Answer

The error you're seeing is because you're calling a method that doesn't exist, probably because it's not part of self.

You'll likely see better performance with multithreading than multiprocessing. Multiprocessing is best for CPU-bound tasks for the simple reason that Python uses a global lock for all operations. Multiprocessing is a hack to get around this lock. It's nastier to work with than threading and it's best to avoid it unless absolutely necessary.

Multithreading is likely enough for your use case and it won't create lots of gotchas for a new programmer. Here's a working sample setup using Python 3's Futures API that will easily scale your problem size, just add your arguments and actual save code in the marked places.

import concurrent.futures

# Save single image
def save_image(image_arg):
    # your image save code goes here
    print("Working on image {}...".format(image_arg))
    return True

# max_workers specifies the number of threads. If None then use 5x your CPU count
with concurrent.futures.ThreadPoolExecutor(max_workers=None) as executor:
    # Images we'll save. Depending on how you generate your images you might not
    # want to materialize a list like this to avoid running out of memory.
    image_args = ["image1", "image2", "image3"]

    # Submit futures to the executor pool.
    # Map each future back to the arguments used to create that future. That way
    # if one fails we know which image it was that failed.
    future_to_args = {executor.submit(save_image, image_arg): image_arg for image_arg in image_args}

    # Images are being saved in worker threads. They will complete in any order.
    for future in concurrent.futures.as_completed(future_to_args):
        image_arg = future_to_args[future]
        try:
            result = future.result()
        except Exception as exc:
            print("Saving image {} generated an exception: {}".format(image_arg, exc))
        else:
            print("Image {} saved successfully.".format(image_arg))

If you insist on multiprocessing, just use ProcessPoolExecutor instead. That might be worthwhile if you also want to generate your images in parallel.

Whether ThreadPoolExecutor or ProcessPoolExecutor is better depends a lot on what the rest of your workload is and how you structured it. Try both to see which works better for you. Note that multiprocessing places restrictions on communication and sharing state between workers, hence why I suggest trying threads first.

Use Multiprocessing for storing large images

Answers (2)

Related Questions