lollercoaster
lollercoaster

Reputation: 16523

python ensure file consistency

I have a Python 2.7.x process running in an infinite loop that monitors a folder in Ubuntu server.

Whenever it finds a file, it checks the file against a set of known files that have been processed already, and acts accordingly. In pseudocode:

found = set()
while True:
   for file in all_files("<DIR>"):
      if file not in found:
         process_file(file, found)

How can I make sure that the file hasn't just begun being copied there? I wouldn't want to say, take MD5 sum of file or open it with another process until I'm sure it's all there and ready.

Upvotes: 0

Views: 141

Answers (2)

Ben Whaley
Ben Whaley

Reputation: 34426

The safest solution is to use the Linux kernel's inotify API via the pyinotify library. Experiment with the IN_CREATE and IN_MOVED_TO events depending on your needs. Also note this blog post warning of some implementation problems with the pyinotify library.

Upvotes: 2

Blue Ice
Blue Ice

Reputation: 7930

Due to locks and other system-level operations, you will not be able to do anything to the file until it has completed copying.

A file cannot be in two operations at once.

Upvotes: 2

Related Questions