Duke Dougal
Duke Dougal

Reputation: 26396

Python, is it safe to process files in a directory?

(Python 3)

I have a process which drops inbound files into a directory (not written in Python).

A separate Python application periodically processes all files in the directory as follow:

def getfilestobeprocessed(path):
    filestobeprocessed = []
    for dirpath, dirnames, filenames in os.walk(path):
        for filename in filenames:
            filestobeprocessed.append({ "filename": filename, "dirpath": dirpath })
    return filestobeprocessed

My concern is, what if the inbound process is half way through writing a large file. What will my Python script do? Will it start to process the file when really it should only be processing files that have finished being written by the inbound processor? Should I be trying to detect whether or not files are open before I process them?

I would have considered using Pyinotify except that this guy criticises it http://www.serpentine.com/blog/2008/01/04/why-you-should-not-use-pyinotify/

Upvotes: 1

Views: 141

Answers (3)

Marichyasana
Marichyasana

Reputation: 3154

When you open (or rename, delete, ...) the file as part of your processing it, you will get a "file in use" error. On windows it is code 32. If and when you see this error, just don't process that file - it will be taken care of on the next go around.

Upvotes: 2

jsf80238
jsf80238

Reputation: 1673

Perhaps the OS can tell you whether another process has the file.

A pretty-good solution would be to decide that after, say, 60 seconds, if the file has not changed then whatever was writing to it is no longer doing so. Have a look at http://docs.python.org/3/library/os.html#os.stat.

Upvotes: 0

Patrick Eaton
Patrick Eaton

Reputation: 716

Use lock files.

So. When it's copying it use filename.lock and then on completing the copy rename it to the correct extension.

Then put a if statement like

def getfilestobeprocessed(path):
    filestobeprocessed = []
    for dirpath, dirnames, filenames in os.walk(path):
        for filename in filenames:
            if !filename.endswith(.lock):
                filestobeprocessed.append({ "filename": filename, "dirpath": dirpath })
    return filestobeprocessed

Upvotes: 1

Related Questions