R3uben
R3uben

Reputation: 125

Python running subprocesses without waiting whilst still receiving return codes

I have found relating questions to mine but cannot find one that solves my problem.

The problem

I am building a program that monitors several directories, then spawns a subprocess based on directory or particular filename.

These subprocesses can often take up to several hours (for example if rendering 000's of PDFs) to complete. Because of this, I would like to know the best way for the program to continue monitoring the folders in parallel to the subprocess that is still running, and be able to spawn additional subprocesses, as long as they are of a different type to the subprocess currently running.

Once the subprocess has completed, the program should be able to receive a return code, that subprocess would be available to run again.

Code as it stands

This is the simple code that runs the program currently, calling functions when a file is found:

while 1:
    paths_to_watch = ['/dir1','/dir2','/dir3','/dir4']
    after = {}
    for x in paths_to_watch:
        key = x
        after.update({key :[f for f in os.listdir(x)]})

    for key, files in after.items():
        if(key == '/dir1'):
            function1(files)
        elif(key == '/dir2'):
            function2(files)
        elif(key == '/dir3'):
            function3(files)
        elif(key == '/dir4'):
            function3(files)
    time.sleep(10)

Of course this means that the program waits for the process to be finished before it continues to check for files in paths_to_watch

From other questions, it looks like this is something that could be handled with process pools, however my lack of knowledge in this area means I do not know where to start.

Upvotes: 0

Views: 39

Answers (1)

Booboo
Booboo

Reputation: 44303

I am assuming that you can use threads rather than processes, an assumption that will hold up if your functions function1 thorugh function4 are predominately I/O bound. Otherwise you should substitute ProcessPoolExecutor for ThreadPoolExecutor in the code below. Right now your program loops indefinitely, so the threads too will never terminate. I am also assuming that that functions function1 through function4 have unique implementations.

import os
import time
from concurrent.futures import ThreadPoolExecutor

def function1(files):
    pass


def function2(files):
    pass


def function3(files):
    pass


def function4(files):
    pass


def process_path(path, function):

    while True:
        files = os.listdir(path)
        function(files)
        time.sleep(10)


def main():
    paths_to_watch = ['/dir1','/dir2','/dir3','/dir4']
    functions = [function1, function2, function3, function4]
    with ThreadPoolExecutor(max_workers=len(paths_to_watch)) as executor:
        results = executor.map(process_path, paths_to_watch, functions)
        for result in results:
            # threads never return so we never get a result
            print(result)

if __name__ == '__main__':
    main()

Upvotes: 1

Related Questions