Subhayan Bhattacharya
Subhayan Bhattacharya

Reputation: 5705

Invoking a subprocess command form inside a Python thread

I have a very simple use case in which i have to find out the files which have been modified in the last 10 mins inside 2 different directories .

Since there are two different directories i spin up two separate threads for doing this and inside each of the running threads the logic of checking the files which have been modified exists.

Below is the code for the same :

import threading
import os
import time
from subprocess import Popen, PIPE

def getLatestModifiedFiles(seconds, _dir):
    files = (fle for rt, _, f in os.walk(_dir) for fle in f if time.time() - os.stat(
    os.path.join(rt, fle)).st_mtime < 300)
    return list(files)

def getLatestModifiedFilesUnix(seconds, _dir):
    lastseconds = seconds * -1
    p = Popen(['/usr/bin/find', _dir, '-mmin', str(lastseconds)], stdout=PIPE, stderr=PIPE)
    out, err = p.communicate()
    print out.strip("\r\n")
    if err != "":
        print err

def run(logPath):
    threadName = threading.currentThread().getName()
    getLatestModifiedFilesUnix(10, logPath)
    #files = getLatestModifiedFiles(300,logPath)
    #for file in files:
     #   print "message from %(ThreadName)s...%(File)s" % {'ThreadName': threadName, 'File' : file}


if __name__ == "__main__":
    logPaths = ["/home/appmyser/Rough", "/home/appmyser/Rough2"]
    threads = []
    for path in logPaths:
        t = Thread(target=run, args=(path,))
        threads.append(t)
        t.start()

    for t in threads:
        t.join()

The function : getLatestModifiedFiles finds out the latest modified files using native Python code on the other hand the function : getLatestModifiedFilesUnix does the same thing using the unix find command.

In the second case i use subprocess which to my knowledge creates a new process . My question is , is it a good practice to invoke a subprocess from within a thread ? Are there any ramifications to it that i should consider ?

Also what is the parent process of the newly created subprocess ? Can someone point out to me in details how it works ?

Many thanks in advance.

Upvotes: 0

Views: 153

Answers (1)

Ondrej K.
Ondrej K.

Reputation: 9664

From the back. Multi-threaded process still only is one process. And regardless of which thread forked (and execed) that process is the parent of newly spawned child. You can run ps -efL on your system to have a look. If you have any multi-threaded applications running (very likely), you will see individual threads and with their own lightweight process identification sharing a single process id.

As for ramifications. When using subprocess, there really should not run into any surprises just for doing that. In lower level an attention needs to be paid to if you fork and have multiple threads as the newly created process will only have one (calling) thread which could lead to all sorts of fun including if it depended on other threads to release locks for instance. But since you run exec shortly thereafter, you run new code anyways.


This is not directly subject of you question, but I would object to calling find to get a list of directories. I'd prefer dealing with it "in house".

Also on that generator in getLatestModifiedFiles, calling time.time() for each comparison isn't only more expensive, but also effectively means that the goal posts are moving depending when each item gets its turn to be processed.

Upvotes: 1

Related Questions