Reputation: 125
I have found relating questions to mine but cannot find one that solves my problem.
The problem
I am building a program that monitors several directories, then spawns a subprocess based on directory or particular filename.
These subprocesses can often take up to several hours (for example if rendering 000's of PDFs) to complete. Because of this, I would like to know the best way for the program to continue monitoring the folders in parallel to the subprocess that is still running, and be able to spawn additional subprocesses, as long as they are of a different type to the subprocess currently running.
Once the subprocess has completed, the program should be able to receive a return code, that subprocess would be available to run again.
Code as it stands
This is the simple code that runs the program currently, calling functions when a file is found:
while 1:
paths_to_watch = ['/dir1','/dir2','/dir3','/dir4']
after = {}
for x in paths_to_watch:
key = x
after.update({key :[f for f in os.listdir(x)]})
for key, files in after.items():
if(key == '/dir1'):
function1(files)
elif(key == '/dir2'):
function2(files)
elif(key == '/dir3'):
function3(files)
elif(key == '/dir4'):
function3(files)
time.sleep(10)
Of course this means that the program waits for the process to be finished before it continues to check for files in paths_to_watch
From other questions, it looks like this is something that could be handled with process pools, however my lack of knowledge in this area means I do not know where to start.
Upvotes: 0
Views: 39
Reputation: 44303
I am assuming that you can use threads rather than processes, an assumption that will hold up if your functions function1
thorugh function4
are predominately I/O bound. Otherwise you should substitute ProcessPoolExecutor
for ThreadPoolExecutor
in the code below. Right now your program loops indefinitely, so the threads too will never terminate. I am also assuming that that functions function1
through function4
have unique implementations.
import os
import time
from concurrent.futures import ThreadPoolExecutor
def function1(files):
pass
def function2(files):
pass
def function3(files):
pass
def function4(files):
pass
def process_path(path, function):
while True:
files = os.listdir(path)
function(files)
time.sleep(10)
def main():
paths_to_watch = ['/dir1','/dir2','/dir3','/dir4']
functions = [function1, function2, function3, function4]
with ThreadPoolExecutor(max_workers=len(paths_to_watch)) as executor:
results = executor.map(process_path, paths_to_watch, functions)
for result in results:
# threads never return so we never get a result
print(result)
if __name__ == '__main__':
main()
Upvotes: 1