Reputation: 26396
(Python 3)
I have a process which drops inbound files into a directory (not written in Python).
A separate Python application periodically processes all files in the directory as follow:
def getfilestobeprocessed(path):
filestobeprocessed = []
for dirpath, dirnames, filenames in os.walk(path):
for filename in filenames:
filestobeprocessed.append({ "filename": filename, "dirpath": dirpath })
return filestobeprocessed
My concern is, what if the inbound process is half way through writing a large file. What will my Python script do? Will it start to process the file when really it should only be processing files that have finished being written by the inbound processor? Should I be trying to detect whether or not files are open before I process them?
I would have considered using Pyinotify except that this guy criticises it http://www.serpentine.com/blog/2008/01/04/why-you-should-not-use-pyinotify/
Upvotes: 1
Views: 141
Reputation: 3154
When you open (or rename, delete, ...) the file as part of your processing it, you will get a "file in use" error. On windows it is code 32. If and when you see this error, just don't process that file - it will be taken care of on the next go around.
Upvotes: 2
Reputation: 1673
Perhaps the OS can tell you whether another process has the file.
A pretty-good solution would be to decide that after, say, 60 seconds, if the file has not changed then whatever was writing to it is no longer doing so. Have a look at http://docs.python.org/3/library/os.html#os.stat.
Upvotes: 0
Reputation: 716
Use lock files.
So. When it's copying it use filename.lock and then on completing the copy rename it to the correct extension.
Then put a if statement like
def getfilestobeprocessed(path):
filestobeprocessed = []
for dirpath, dirnames, filenames in os.walk(path):
for filename in filenames:
if !filename.endswith(.lock):
filestobeprocessed.append({ "filename": filename, "dirpath": dirpath })
return filestobeprocessed
Upvotes: 1