Rich
Rich

Reputation: 12663

Is Python retaining a reference to a file opened in a list?

I have a program where I need to keep some objects that open files on disk list and delete those files after the program is done. However Python seems to be keeping the file open even though there are no more references to an object that should have it open. I've been able to recreate the problem with pure file objects below:

import os

filenames = ['a.txt', 'b.txt']
files = [open(f,'w') for f in filenames]
for f_object in files:
    f_object.write("test")

del files[:]

for name in filenames:
    os.remove(name)

When I run this on Windows I get the error

Traceback (most recent call last):
  File ".\file_del.py", line 11, in <module>
    os.remove(name)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'b.txt'

Interesting that it is able to delete a.txt without a problem. What is causing b.txt file to be open even though the references to it are gone?

Update

In the original problem, I don't have access to the files to close them. Trust me, I would love to close those files. See the following:

base_uri = 'dem'
out_uri = 'foo.tif'
new_raster_from_base_uri(base_uri, out_uri, 'GTiff', -1, gdal.GDT_Float32)

ds = []
for filename in [out_uri]:
    ds.append(gdal.Open(filename, gdal.GA_Update))
band_list = [dataset.GetRasterBand(1) for dataset in ds]
for band in band_list:
    for row_index in xrange(band.YSize):
        a = numpy.zeros((1, band.XSize))
        band.WriteArray(a, 0, row_index)

for index in range(len(ds)):
    band_list[index] = None
    ds[index] = None

del ds[:]

os.remove(out_uri)

Update 2

I've marked millimoose's answer as the correct one below since it fixes the issue with the abstracted problem of files that I presented here. Unfortuantely it didn't work with the GDAL objects I was using. For future reference, I dug deep and found the undocumented gdal.Dataset.__destroy_swig__(ds) function which seems to at least close the file that the dataset is associated with. I call that first before deleting the file on disk associated with the datasets and that seems to work.

Upvotes: 3

Views: 1275

Answers (4)

Vorticity
Vorticity

Reputation: 4926

Millimoose is correct that f_object is still holding a reference to the last file in the list. You simply need to reset or delete that variable. I have run into much weirder situations where references were inexplicably being held onto in the past. Below is a method that can be used to test whether all references have been garbage collected or not. Please note, this method of using weakrefs will cause you no end of headaches if you attempt to use it from within IPython.

#!/bin/env python

import weakref
from sys import getrefcount

#Open two lists of files
f1 = [file('temp1.txt','w'), file('temp2.txt','w')]
f2 = [file('temp3.txt','w'), file('temp4.txt','w')]

#Loop over both to create arrays of weak references
weak_f1 = [weakref.ref(x) for x in f1]
weak_f2 = [weakref.ref(x) for x in f2]

#Note that x still contains a reference to f2[1]
print x

#Print the number of references for each file
print 'Note, temp4.txt has an extra reference.'
print 'temp1.txt ref count == %r' % getrefcount(weak_f1[0]())
print 'temp2.txt ref count == %r' % getrefcount(weak_f1[1]())
print 'temp3.txt ref count == %r' % getrefcount(weak_f2[0]())
print 'temp4.txt ref count == %r\n' % getrefcount(weak_f2[1]())

#Delete both arrays
print 'Deleting arrays.'
del f1[:]
del f2[:]

#Print the number of references again
print 'temp1.txt ref count == %r' % getrefcount(weak_f1[0]())
print 'temp2.txt ref count == %r' % getrefcount(weak_f1[1]())
print 'temp3.txt ref count == %r' % getrefcount(weak_f2[0]())
print 'temp4.txt ref count == %r\n' % getrefcount(weak_f2[1]())

#Note, temp4.txt still has two references while the others show MANY references
#This is because a reference to temp4.txt still exists in `x`.
#The the other files show many references because they are now pointed at `None`.
print 'All weak refs are now dead except the one still stored in `x`'
print weak_f1
print weak_f2, '\n'

#Delete `x` and this extra reference is gone
print 'Deleting `x`'
del x

#All references are now `None`
print 'Now we have lost our last file reference and all weakrefs are dead'
print weak_f1
print weak_f2

Upvotes: 3

millimoose
millimoose

Reputation: 39970

The scope of the loop variable f_object is actually the surrounding function / module. that means it retains a reference to the last file from the iteration even if you clear the list. The following works properly:

import os

filenames = ['a.txt', 'b.txt']
files = [open(f,'w') for f in filenames]
for f_object in files:
    f_object.write("test")

del files[:]
# Nuke the last reference.
del f_object 

for name in filenames:
    os.remove(name)

I suppose in your original code it would be del band. Alternately, move the loop into a function to avoid the loop variable leaking:

import os

def write_to_files(files):
    for f_object in files:
        f_object.write("test")  

filenames = ['a.txt', 'b.txt']
files = [open(f,'w') for f in filenames]
write_to_files(files)

del files[:]

for name in filenames:
    os.remove(name)

Upvotes: 4

Adam Rosenfield
Adam Rosenfield

Reputation: 400454

You need to close the files with the file.close() method. Files do get closed automatically when the garbage collector runs, but when that happens is non-deterministic.

The preferred way of ensuring that files get closed deterministically even in the face of exceptions is with a with statement context manager:

with open('filename') as f:
    # Do file operations on f
    ...

# At this scope, f is now closed, even if an exception was thrown

If you're on Python 2.5, you must write from __future__ import with_statement at the beginning of your program; if you're on Python 2.6 or later, then that's not necessary.

Upvotes: 1

prgao
prgao

Reputation: 1787

you have to close the files

for f_object in files:
    f_object.write("test")
    f_object.close()

Upvotes: 0

Related Questions