d0g
d0g

Reputation: 1399

How to have several (python) processes watch a folder for items but take action one at a time?

Say I have a python script which watches a folder for new files, and then processes the files (one at a time) based on certain criteria (in their names.)

I need to run several of these "watchers" at the same time, so that they can process several files at once. (Rendering video.)

Once a watcher picks up a file for processing, it renames it (prepending rendering_)

What's the best way to make sure that 2 or more of the watchers don't pick up the same file at the same time and try to render the same job?

My only idea is to have each 'watcher' check only when the current time in seconds is x, so that process 1 checks when it's :01 past the minute, etc. But this seems silly, and we'd have to wait a whole minute for every check.

Just to clarify ... say I have 4 instances of watcher running. In the watch folder 7 items are added: job1..job7. I want 1 watcher to pick up 1 job.

When a watcher is done, it should grab the next job. So watcher1 might do job1, watcher2 does job2, etc.

When watcher1 is done with job1, it should pick up job5.

I hope that's clear.

Also, I want each 'watcher' running in its own Terminal window, where we can see its progress, as well as easily terminate, or launch more watchers.

Upvotes: 0

Views: 166

Answers (2)

r.ook
r.ook

Reputation: 13888

To expand on my comment, you can try to rename the files and track each file type/name by each watcher like so:

watcher 1 -> check for .step0 files
             rename to .step1 when finished
watcher 2 -> check for .step1 files
             rename to .step2 when finished
...
watcher n -> check for .step{n-1} files
             rename to .final_format when finished

To demonstrate, here's a sample using multiprocessing to instantiate 4 different watchers:

import time, glob
from multiprocessing import Process

path = 'Watcher Demo'

class Watcher(object):
    def __init__(self, num):
        self.num = num
        self.lifetime = 50.0

    def start(self):
        start = time.time()
        targets = '\\'.join((path, f'*.step{self.num-1}'))
        while time.time() - start <= self.lifetime:
            for filename in glob.glob(targets):
                time.sleep(2) # arificial wait so we can see the effects
                with open(filename, 'a') as file:                    
                    file.write(f"I've been touched inappropriately by watcher {self.num}\n")
                newname = glob.os.path.splitext(filename)[0] + f'.step{self.num}'
                glob.os.rename(filename, newname)

def create_file():
    for i in range(7):
        filename = '\\'.join((path, f'job{i}.step0'))
        with open(filename, 'w') as file:
            file.write(f'new file {i}\n')
        time.sleep(5)

if __name__ == '__main__':
    if not glob.os.path.exists(path):
        glob.os.mkdir(path)
    watchers = [Watcher(i).start for i in range(1, 5)]
    processes = [Process(target=p) for p in [create_file] + watchers]
    for proc in processes:
        proc.start()
    for proc in processes:
        proc.join()

Which will create and process files like so:

create_file()          -> *newfile*  -> job0.step0
Watcher(1).start()     -> job0.step0 -> job0.step1
watcher2('job0.step1') -> job0.step1 -> job0.step2
watcher3('job0.step2') -> job0.step2 -> job0.step3
watcher4('job0.step3') -> job0.step3 -> job0.step4

And the files (e.g. job0.step4) will be done in order:

new file 0
I've been touched inappropriately by watcher 1
I've been touched inappropriately by watcher 2
I've been touched inappropriately by watcher 3
I've been touched inappropriately by watcher 4

I haven't renamed the file format to a final one as this is just a demo, but it's easily doable as your final code should have different watchers instead of generic ones anyhow.

With multiprocess module you won't be able to see separate terminals for each watcher, but this is just to demonstrate the concept... You can always switch to subprocess module.


As a side note, I do notice a bit of a performance dip while I was testing this. I'm assuming it's because the program is continuously looping and watching. A better, more efficient way would be to schedule your watches as a task to run in specific time. You can run watch1 every hour at the dot, watch2 every hour at 15th minute, watch3 ever hour at 30th minute... etc. This is a way more efficient approach as it only looks for file once, and only process them if found.

Upvotes: 0

Alexis Drakopoulos
Alexis Drakopoulos

Reputation: 1145

You should be using something like multiprocessing I think.

What you can do is have 1 master program that watches for files constantly.

Then when it detects something that master program sends it off to 1 slave and continues watching.

So instead of 5 scripts looking, have 1 looking and the rest processing when the one looking tells them to.

You asked how I would do this, I'm not experienced and this is probably not a great way to do it:

In order to do this you can have the main script store the data you want in a variable temporarily. Let's say the variable is called "Data".

Then you can use something like subprocess if in windows to get it running from master script:

subprocess.run(["python", "slave_file.py"])

Then you can have another python script (the slave scripts) which do:

from your_master_script import x

and then do things.

Upvotes: 1

Related Questions