emish
emish

Reputation: 2853

How do you check when a file is done being copied in Python?

I'd like to figure out a way to alert a python script that a file is done copying. Here is the scenario:

  1. A folder, to_print is being watched by the script by constantly polling with os.listdir().

  2. Every time os.listdir() returns a list of files in which a file exists that hasn't been seen before, the script performs some operations on that file, which include opening it and manipulating its contents.

This is fine when the file is small, and copying the file from its original source to the directory being watched takes less time than the amount of time remaining until the next poll by os.listdir(). However, if a file is polled and found, but it is still in the process of being copied, then the file contents are corrupt when the script tries to act on it.

Instead, I'd like to be able to (using os.stat or otherwise) know that a file is currently being copied, and wait for it to be done until I act on it if so.

My current idea is to use os.stat() every time I find a new file, then wait until the next poll and compare the date modified/created time since the last time I polled, and if they remain the same then that file is "stable", otherwise keep polling until it is. I'm not sure this will work though as I am not too familiar with how Linux/Unix updates these values.

Upvotes: 6

Views: 5501

Answers (2)

nalply
nalply

Reputation: 28717

Try inotify.

This is a Linux standard for watching files. For your use-case the event IN_CLOSE_WRITE seems to be promising. There is a Python library for inotify. A very simple example (taken from there). You'll need to modify it to catch only IN_CLOSE_WRITE events.

# Example: loops monitoring events forever.
#
import pyinotify

# Instanciate a new WatchManager (will be used to store watches).

wm = pyinotify.WatchManager()
# Associate this WatchManager with a Notifier (will be used to report and
# process events).

notifier = pyinotify.Notifier(wm)
# Add a new watch on /tmp for ALL_EVENTS.
wm.add_watch('/tmp', pyinotify.ALL_EVENTS) # <-- replace by IN_CLOSE_WRITE

# Loop forever and handle events.
notifier.loop()

Here is an extensive API documentation: http://seb-m.github.com/pyinotify/

Upvotes: 3

kindall
kindall

Reputation: 184191

Since the files can be copied within the poll interval, just process the new files found by the last poll before checking for new files. In other words, instead of this:

while True:
    newfiles = check_for_new_files()
    process(newfiles)
    time.sleep(pollinterval)

Do this:

newfiles = []

while True:
    process(newfiles)
    newfiles = check_for_new_files()
    time.sleep(pollinterval)

Or just put the wait in the middle of the loop (same effect really):

while True:
    newfiles = check_for_new_files()
    time.sleep(pollinterval)
    process(newfiles)

Upvotes: 2

Related Questions