Reputation: 12246
I have the following Python code:
import sys
import os
from concurrent.futures import ThreadPoolExecutor
VIDEOS = [ # A list of 9 videos
{... bla bla ...},
{... bla bla ...},
{... bla bla ...}
]
SAMPLING_FREQUENCIES = [1, 2.4, 3.71, ... , 14.3] # A list of 8 frequencies
def process_video(video_obj, sampling_frequency, process_work_dir):
os.makedirs(process_work_dir)
# ... do some heavy processing ...
if __name__ == '__main__':
output_work_dir = sys.argv[1]
os.makedirs(output_work_dir)
executor = ThreadPoolExecutor(max_workers=4)
for video_obj in VIDEOS:
for samp_fps in SAMPLING_FREQUENCIES: # Totally 8 * 9 threads should be waiting to be executed
work_dir = os.path.join(output_work_dir, os.path.basename(video_obj['path']), str(samp_fps))
executor.submit(lambda : process_video(video_obj, samp_fps, work_dir))
What I observe is, that at first, as expected, 4 threads are running. I can validate it by making sure that exactly 4 work directories are being created.
Then, the first thread finished its work, and as expected, another thread that had been waiting, started running.
The problem is, that although I created 72 threads (9 videos times 8 sampling frequencies), no threads were executed anymore. When the 5th thread finished its work, the application terminated.
Is it a bug, or do I have a problem with understanding the ThreadPoolExecutor API?
I use Python 3.6.5
Upvotes: 0
Views: 1992
Reputation: 11413
If you want to executor to complete all tasks then you might want to declare it in a with
block. Here the print statement I added would be printed after all threads are complete.
The problem your seeing is the executor will submit things and not wait, letting your code die when it hits the EOF
killing tasks waiting as the interpreter shuts down.
*This is not the case please ignore
Also note this line:
executor.submit(lambda : process_video(video_obj, samp_fps, work_dir))
This seems like the lambda would be the result of the process_video
method which means it would run in the main thread. I think the usage is clearer and cleaner if specified as the docs propose.
executor.submit(process_video, video_obj, samp_fps, work_dir)
Try this:
if __name__ == '__main__':
output_work_dir = sys.argv[1]
os.makedirs(output_work_dir)
with ThreadPoolExecutor(max_workers=4) as executor:
for video_obj in VIDEOS:
for samp_fps in SAMPLING_FREQUENCIES: # Totally 8 * 9 threads should be waiting to be executed
work_dir = os.path.join(output_work_dir, os.path.basename(video_obj['path']), str(samp_fps))
executor.submit(process_video, video_obj, samp_fps, work_dir)
print("Hello, all done with thread work!")
Upvotes: 1