Nikratio
Nikratio

Reputation: 2449

Python file objects, closing, and destructors

The description of tempfile.NamedTemporaryFile() says:

If delete is true (the default), the file is deleted as soon as it is closed.

In some circumstances, this means that the file is not deleted after the Python interpreter ends. For example, when running the following test under py.test, the temporary file remains:

from __future__ import division, print_function, absolute_import
import tempfile
import unittest2 as unittest
class cache_tests(unittest.TestCase):
    def setUp(self):
        self.dbfile = tempfile.NamedTemporaryFile()
    def test_get(self):
        self.assertEqual('foo', 'foo')

In some way this makes sense, because this program never explicitly closes the file object. The only other way for the object to get closed would presumably be in the __del__ destructor, but here the language references states that "It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits." So everything is consistent with the documentation so far.

However, I'm confused about the implications of this. If it is not guaranteed that file objects are closed on interpreter exit, can it possibly happen that some data that was successfully written to a (buffered) file object is lost even though the program exits gracefully, because it was still in the file object's buffer, and the file object never got closed?

Somehow that seems very unlikely and un-pythonic to me, and the open() documentation doesn't contain any such warnings either. So I (tentatively) conclude that file objects are, after all, guaranteed to be closed.

But how does this magic happen, and why can't NamedTemporaryFile() use the same magic to ensure that the file is deleted?

Edit: Note that I am not talking about file descriptors here (that are buffered by the OS and closed by the OS on program exit), but about Python file objects that may implement their own buffering.

Upvotes: 6

Views: 4458

Answers (3)

Armin Rigo
Armin Rigo

Reputation: 12900

On Windows, NamedTemporaryFile uses a Windows-specific extension (os.O_TEMPORARY) to ensure that the file is deleted when it is closed. This probably also works if the process is killed in any way. However there is no obvious equivalent on POSIX, most likely because on POSIX you can simply delete files that are still in use; it only deletes the name, and the file's content is only removed after it is closed (in any way). But indeed assuming that we want the file name to persist until the file is closed, like with NamedTemporaryFile, then we need "magic".

We cannot use the same magic as for flushing buffered files. What occurs there is that the C library handles it (in Python 2): the files are FILE objects in C, and the C guarantees that they are flushed on normal program exit (but not if the process is killed). In the case of Python 3, there is custom C code to achieve the same effect. But it's specific to this use case, not anything directly reusable.

That's why NamedTemporaryFile uses a custom __del__. And indeed, __del__ are not guaranteed to be called when the interpreter exits. (We can prove it with a global cycle of references that also references a NamedTemporaryFile instance; or running PyPy instead of CPython.)

As a side note, NamedTemporaryFile could be implemented a bit more robustly, e.g. by registering itself with atexit to ensure that the file name is removed then. But you can call it yourself too: if your process doesn't use an unbounded number of NamedTemporaryFiles, it's simply atexit.register(my_named_temporary_file.close).

Upvotes: 15

Yuushi
Yuushi

Reputation: 26040

On any version of *nix, all file descriptors are closed when a process finishes, and this is taken care of by the operating system. Windows is likely exactly the same in this respect. Without digging in the source code, I can't say with 100% authority what actually happens, but likely what happens is:

  • If delete is False, unlink() (or a function similar to it on other operating systems) is called. This means that the file will automatically be deleted when the process exits and there are no more open file descriptors. While the process is running, the file will still remain around.

  • If delete is True, likely the C function remove() is used. This will forcibly delete the file before the process exits.

Upvotes: 1

perreal
perreal

Reputation: 97948

The file buffering is handled by the Operating System. If you do not close a file after you open it, it is because you are assuming that the operating system will flush the buffer and close the file after the owner exists. This is not Python magic, this is your OS doing it's thing. The __del__() method is related to Python and requires explicit calls.

Upvotes: -1

Related Questions