Reputation: 103
i write a application/Gui in PyQt5 and would like store large Images (>5000 RGB Images)
Now, I have a function, which store every Picture with cv2.imwrite, but this process takes a lot of time. So i read here in Stackoverflow, that I can do this with multiprocessing. But I´m very new in python.
My Multiprocessing Code:
def SaveImages(self):
jobs = []
for i in range(5):
p = multiprocessing.Process(target = self.SaveAllImages, args=self)
jobs.append(p)
p.start()
In Function SaveAllImages are the basic Code for storing Images per Frame. If I run this Code - there are a Error:
p = multiprocessing.Process(target = SaveAllImages, args=self)
NameError: name 'SaveAllImages' is not defined
But SaveAllImages are definied: def SaveAllImages(self)
So my Question are:
Why i became this error
How can I implement a very simply multiprocessing for storing Images
Upvotes: 2
Views: 1688
Reputation: 43523
Before you try to improve, you should always measure performance.
Use a disk testing program to see what the maximum sustained write throughput of your disk is.
Then use a performance monitoring program to check the write throughput that your program generates (without multi- threading/processing). If your program can reach the same throughput as the test program most of the time, then there is little you can do.
Assuming that you are using a regular harddisk, the best way to improve write performance is to use an SSD instead.
Upvotes: 1
Reputation: 17369
The error you're seeing is because you're calling a method that doesn't exist, probably because it's not part of self
.
You'll likely see better performance with multithreading than multiprocessing. Multiprocessing is best for CPU-bound tasks for the simple reason that Python uses a global lock for all operations. Multiprocessing is a hack to get around this lock. It's nastier to work with than threading and it's best to avoid it unless absolutely necessary.
Multithreading is likely enough for your use case and it won't create lots of gotchas for a new programmer. Here's a working sample setup using Python 3's Futures API that will easily scale your problem size, just add your arguments and actual save code in the marked places.
import concurrent.futures
# Save single image
def save_image(image_arg):
# your image save code goes here
print("Working on image {}...".format(image_arg))
return True
# max_workers specifies the number of threads. If None then use 5x your CPU count
with concurrent.futures.ThreadPoolExecutor(max_workers=None) as executor:
# Images we'll save. Depending on how you generate your images you might not
# want to materialize a list like this to avoid running out of memory.
image_args = ["image1", "image2", "image3"]
# Submit futures to the executor pool.
# Map each future back to the arguments used to create that future. That way
# if one fails we know which image it was that failed.
future_to_args = {executor.submit(save_image, image_arg): image_arg for image_arg in image_args}
# Images are being saved in worker threads. They will complete in any order.
for future in concurrent.futures.as_completed(future_to_args):
image_arg = future_to_args[future]
try:
result = future.result()
except Exception as exc:
print("Saving image {} generated an exception: {}".format(image_arg, exc))
else:
print("Image {} saved successfully.".format(image_arg))
If you insist on multiprocessing, just use ProcessPoolExecutor
instead. That might be worthwhile if you also want to generate your images in parallel.
Whether ThreadPoolExecutor
or ProcessPoolExecutor
is better depends a lot on what the rest of your workload is and how you structured it. Try both to see which works better for you. Note that multiprocessing places restrictions on communication and sharing state between workers, hence why I suggest trying threads first.
Upvotes: 4