tjmgis
tjmgis

Reputation: 1661

Python watchdog windows wait till copy finishes

I am using the Python watchdog module on a Windows 2012 server to monitor new files appearing on a shared drive. When watchdog notices the new file it kicks off a database restore process.

However, it seems that watchdog will attempt to restore the file the second it is created and not wait till the file has finished copying to the shared drive. So I changed the event to on_modified but there are two on_modified events, one when the file is initially being copied and one when it is finished being copied.

How can I handle the two on_modified events to only fire when the file being copied to the shared drive has finished?

What happens when multiple files are copied to the shared drive at the same time?

Here is my code

import time
import subprocess
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler

class NewFile(FileSystemEventHandler):
    def process(self, event):
        if event.is_directory:
            return

    if event.event_type == 'modified':            
        if getext(event.src_path) == 'gz':
            load_pgdump(event.src_path)

    def on_modified(self, event):
        self.process(event)

def getext(filename):
    "Get the file extension"
    file_ext = filename.split(".",1)[1]
    return file_ext

def load_pgdump(src_path):    
    restore = 'pg_restore command ' + src_path
    subprocess.call(restore, shell=True)

def main():
    event_handler = NewFile()
    observer = Observer()
    observer.schedule(event_handler, path='Y:\\', recursive=True)
    observer.start()

    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()
    observer.join()

if __name__ == '__main__':
    main()

Upvotes: 14

Views: 17748

Answers (10)

Rohit Chaku
Rohit Chaku

Reputation: 47

Following up to ravenwing's answer, more details can be found about on_closed in watchdog here. As mentioned in the documented issue, there is no documentation available for on_closed yet and it can only be used with unix.

Upvotes: 1

ravenwing
ravenwing

Reputation: 837

On linux you also get close event. Than solution would be to wait with processing file until file gets closed. My approach would be to add on_closed handling.

class Handler(FileSystemEventHandler):
    def __init__(self):
        self.files_to_process = set()

    def dispatch(self, event):
        _method_map = {
            'created': self.on_created,
            'closed': self.on_closed
        }

    def on_created(self, event):
        self.files_to_process.add(event.src_path)

    def on_closed(self, event):
        self.files_to_process.remove(event.src_path)
        actual_processing(event.src_path)

Upvotes: 3

CLipp
CLipp

Reputation: 108

Old I know, but I recently came up with a solution for this exact problem. In my case, I was only concerned with wav and mp3 files. This function will ensure that only files that are completely copied will be sent to makerCore() because the created placeholder files do not have any extension and will always end up in 'not ready'. Once the file is completed it will trigger the watchdog module again except this time with an extension. This will work on multiple files simultaneously as well.

def on_created(event):
    #print(event)
    if str(event.src_path).endswith('.mp3') or str(event.src_path).endswith('.wav'):
        makerCore(event)
    else:
        print('not ready')

Upvotes: 1

b3d
b3d

Reputation: 1

I've tried the check filesize - wait - check again routine many have suggested above but it's not very reliable. To make it work better I've added a check if the file is still locked.

    file_done = False
    file_size = -1

    while file_size != os.path.getsize(file_path):
        file_size = os.path.getsize(file_path)
        time.sleep(1)

    while not file_done:
        try:
            os.rename(file_path, file_path)
            file_done = True
        except:
            return True

Upvotes: 0

sathish
sathish

Reputation: 161

This works for me. Tested in windows as well with python3.7

while True:
        size_now = os.path.getsize(event.src_path)
        if size_now == size_past:
            log.debug("file has copied completely now size: %s", size_now)
            break
            # TODO: why sleep is not working here ?
        else:
            size_past = os.path.getsize(event.src_path)
            log.debug("file copying size: %s", size_past)

Upvotes: 1

Yohan Obadia
Yohan Obadia

Reputation: 2682

I am using a different approach that might not be the most elegant one but is easy to do on any plateform if you have control on the side copying the file.

Just had 'in-progress' to the name of the file until the copying is complete, and then rename the file. You can then have a while loop waiting for the file with the name without 'in-progress' to exist and you're good.

Upvotes: 0

Dmytro
Dmytro

Reputation: 1390

I'm using following code to wait until file copied (for Windows only):

from ctypes import windll
import time

def is_file_copy_finished(file_path):
    finished = False

    GENERIC_WRITE         = 1 << 30
    FILE_SHARE_READ       = 0x00000001
    OPEN_EXISTING         = 3
    FILE_ATTRIBUTE_NORMAL = 0x80

    if isinstance(file_path, str):
        file_path_unicode = file_path.decode('utf-8')
    else:
        file_path_unicode = file_path

    h_file = windll.Kernel32.CreateFileW(file_path_unicode, GENERIC_WRITE, FILE_SHARE_READ, None, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, None)

    if h_file != -1:
        windll.Kernel32.CloseHandle(h_file)
        finished = True

    print 'is_file_copy_finished: ' + str(finished)
    return finished

def wait_for_file_copy_finish(file_path):
    while not is_file_copy_finished(file_path):
        time.sleep(0.2)

wait_for_file_copy_finish(r'C:\testfile.txt')

The idea is to try open a file for write with share read mode. It will fail if someone else is writing to it.

Enjoy ;)

Upvotes: 4

Mtl Dev
Mtl Dev

Reputation: 1622

In your on_modified event, just wait until the file is finished being copied, via watching the filesize.

Offering a Simpler Loop:

historicalSize = -1
while (historicalSize != os.path.getsize(filename)):
  historicalSize = os.path.getsize(filename)
  time.sleep(1)
print "file copy has now finished"

Upvotes: 12

jake77
jake77

Reputation: 2034

I had a similar issue recently with watchdog. A rather simple but not very smart workaround was for me to check the change of file size in a while loop using a two-element list, one for 'past', one for 'now'. Once the the values are equal the copying is finished.

Edit: something like this.

past = 0
now = 1
value = [past, now]
while True:
    # change

    # test
    if value[0] == value[1]:
        break
    else:
        value = [value[1], value[0]]

Upvotes: 1

iri
iri

Reputation: 744

I would add a comment as this isn't an answer to your question but a different approach... but I don't have enough rep yet. You could try monitoring filesize, if it stops changing you can assume copy has finished:

copying = True
size2 = -1
while copying:
    size = os.path.getsize('name of file being copied')
    if size == size2:
        break
    else:
        size2 = os.path.getsize('name of file being copied')
        time.sleep(2)

Upvotes: 3

Related Questions