Reputation: 3232
I have a library that interacts with a configuration file. When the library is imported, the initialization code reads the configuration file, possibly updates it, and then writes the updated contents back to the file (even if nothing was changed).
Very occasionally, I encounter a problem where the contents of the configuration file simply disappear. Specifically, this happens when I run many invocations of a short script (using the library), back-to-back, thousands of times. It never occurs during the same directories, which leads me to believe it's a somewhat random problem--specifically a race condition with IO.
This is a pain to debug, since I can never reliably reproduce the problem and it only happens on some systems. I have a suspicion about what might happen, but I wanted to see if my picture of file I/O in Python is correct.
So the question is, when does a Python program actually write file contents to a disk? I thought that the contents would make it to disk by the time that the file closed, but then I can't explain this error. When python closes a file, does it flush the contents to the disk itself, or simply queue it up to the filesystem? Is it possible that file contents can be written to disk after Python terminates? And can I avoid this issue by using fp.flush(); os.fsync(fp.fileno())
(where fp
is the file handle)?
If it matters, I'm programming on a Unix system (Mac OS X, specifically). Edit: Also, keep in mind that the processes are not running concurrently.
Appendix: Here is the specific race condition that I suspect:
Upvotes: 4
Views: 2233
Reputation: 24110
It is almost certainly not python's fault. If python closes the file, OR exits cleanly (rather than killed by a signal), then the OS will have the new contents for the file. Any subsequent open should return the new contents. There must be something more complicated going on. Here are some thoughts.
What you describe sounds more likely to be a filesystem bug than a Python bug, and a filesystem bug is pretty unlikely.
Filesystem bugs are far more likely if your files actually reside in a remote filesystem. Do they?
Do all the processes use the same file? Do "ls -li" on the file to see its inode number, and see if it ever changes. In your scenario, it should not. Is it possible that something is moving files, or moving directories, or deleting directories and recreating them? Are there symlinks involved?
Are you sure that there is no overlap in the running of your programs? Are any of them run from a shell with "&" at the end (i.e. in the background)? That could easily mean that a second one is started before the first one is finished.
Are there any other programs writing to the same file?
This isn't your question, but if you need atomic changes (so that any program running in parallel only sees either the old version or the new one, never the empty file), the way to achieve it is to write the new content to another file (e.g. "foo.tmp"), then do os.rename("foo.tmp", "foo"). Rename is atomic.
Upvotes: 2