harijay
harijay

Reputation: 11883

Use python select kqueue on OSX to monitor file creation by external application

Typically the transcode of my 1 hr long audio recording sessions to an mp3 file takes twenty odd minutes.

I want to use a python script to execute a series of python code when the OSX application garageband finishes writing that mp3 file.

What are the best ways in python to detect that an external application is done writing data to a file and closed that file. I read about kqueue and epoll, but since I have no background in os event detection and couldnt find a good example I am asking for one here.

The code I am using right now does the following and I am looking for something more elegant.

while True:
    try:
        today_file = open("todays_recording.mp3","r")
        my_custom_function_to_process_file(today_file)
    except IOError:
         print "File not ready yet..continuing to wait"

Upvotes: 3

Views: 934

Answers (2)

Zac B
Zac B

Reputation: 4232

The answer is a bit nuanced.

How to watch for file writes using kqueue/kevent

Technically, you can wait for data to be written to a file like this:

from select import kqueue, kevent, KQ_FILTER_VNODE, KQ_NOTE_WRITE

fh = open('test.txt', 'r')
kqh = kqueue()
res = kqh.control([kevent(fh, KQ_FILTER_VNODE, fflags=KQ_NOTE_WRITE)], 1, 10)
print(res)

After a write is made to test.txt, res will be returned containing one or more events. If 10 seconds go by without a write being detected, an empty array will be returned in res. If the watch queue overflows, which is extremely unlikely in this case as you're not watching for huge numbers of events on e.g. a directory of thousands of files, an overflow event will be returned in res.

But is that a good idea?

Why watching for writes is a bad idea

Watching for writes to determine when an external program is "done" with a file is generally a poor idea. Just because a write occurs doesn't mean that the program (GarageBand in this case) isn't going to make any more writes. Worse, from the perspective of that other program, its function call to write data may have already succeeded, but underlying buffering (at the language platform or OS level) may cause the notification to be delivered later or not at all in some rare cases.

If all you watch for is writes, then you end up having to poll the file to see if it's completely written (reinventing the code you already have) or use fallible and over-specific heuristics to guess when the writer is done (e.g. "writes were observed, then no more writes for 1 second; that probably means the writer is finished ... or else it means the computer's under heavy load and GarageBand is taking a long time to encode the next chunk of MP3 data before writing it").

Watching for closes

That brings us to the second part of your inner question: can you watch for external programs to close a file? If we can do that, we can get a much more reliable hint that the external program is done working with a given file.

The answer is yes ... but not easily in Python, and not at all on MacOS.

kqueue(3) supports the NOTE_CLOSE and NOTE_CLOSE_WRITE fflags, which fire when a reader or writer handle to a file is closed. However, the Python stdlib doesn't supply those flags in the select module in the latest version as of the time of this writing (3.12).

Fortunately, this is an old BSD API and unlikely to change, so grabbing the raw value of those flags from the kernel source a BSD (I found them in the NetBSD source) is easy:

#define NOTE_CLOSE  0x0100U         /* file closed (no FWRITE) */
#define NOTE_CLOSE_WRITE 0x0200U        /* file closed (FWRITE) */

Those values are 256 and 512 in unsigned (and signed) 16/32bit integers, so we should be able to wait for them manually, like this:

from select import kqueue, kevent, KQ_FILTER_VNODE

KQ_NOTE_CLOSE = 256
KQ_NOTE_CLOSE_WRITE = 512

fh = open('test.txt', 'r')
kqh = kqueue()
res = kqh.control([kevent(fh, KQ_FILTER_VNODE, fflags=KQ_NOTE_CLOSE_WRITE | KQ_NOTE_CLOSE)], 1, 10)
print(res)

However, that doesn't work (doesn't wake up when other programs close the file), because NOTE_CLOSE and NOTE_CLOSE_WRITE are not available on MacOS. Unfortunately, it doesn't seem like the MacOS-native FSEvents file monitoring API publishes events relating to file closure either.

The verdict is that this is not possible on MacOS, but is likely possible (with a bit of unauthorized mucking about with select.kqueue internal flags) on other BSDs.

Upvotes: 0

Matt Billenstein
Matt Billenstein

Reputation: 678

You could popen lsof and filter by either the process or file you're interested in...

Upvotes: 1

Related Questions