embeddedPy
embeddedPy

Reputation: 199

Guarantee of Python script manages to finish reading deleted file

I am running a Python script in bash for Windows that, to simplify, does

with open('large.txt', 'r') as infile:
    for line in infile.readlines():
        print(line)

The file that it reads is expected to be large. In the middle of its execution the file is deleted in Windows. The script does finish printing the full content of the file in examples that I tried.

For example, with the output of

with open('large.txt', 'w') as outfile:
    for n in range(10000000):
        outfile.write('{}\n'.format(n))

Q: My concrete question is if I have guarantees of this behavior, that the script will manage to process the entire file. For example, if the size is such that even though it fits in disc, it doesn't fit in memory.

Q: In case it doesn't, does this part of the script exits its execution with some exception that I can catch to produce an alternate behavior?

Upvotes: 0

Views: 43

Answers (2)

Sam Mason
Sam Mason

Reputation: 16204

what Dietrich says is correct: if a process just "delete"s the file and no other processes have the file open, then, yes, you'll read to the end of the file (assuming lack of IO errors, etc). but I thought it could be useful point out a different but related issue

if another process truncates the file, either before it is deleted, or the process has an open file handle and truncates it after deletion, then your program will stop reading when it reaches the new end of the file

Upvotes: 1

Dietrich Epp
Dietrich Epp

Reputation: 213638

The answer is different when you are talking about Windows or WSL, I believe.

On Windows, deleting a file actually "marks the file for deletion", but actual deletion will only happen once all handles are closed. The file is still on disk, even though it won't appear to be there if you look for it. You cannot create another file with the same name until the original file is completely deleted. You can continue reading the file until it is deleted.

WSL provides POSIX file semantics—with POSIX semantics, the file is instead “unlinked” and is not deleted until the last reference is gone. You can continue to read it as long as you have a reference to the file, and since the file has been completely unlinked, you can create a new file with the same name.

Q: My concrete question is if I have guarantees of this behavior, that the script will manage to process the entire file. For example, if the size is such that even though it fits in disc, it doesn't fit in memory.

To answer your question: Yes, the script will finish processing the file.

The file is still on disk, not in memory.

On both Windows and WSL, the file is not actually deleted until all references are gone, but the semantics are a bit different.

Important Note

You said that the file is large, but this code:

with open('large.txt', 'r') as infile:
    for line in infile.readlines():
        print(line)

What it does is read the entire file into memory and then print it out one line at a time. You probably want this instead:

with open('large.txt', 'r') as infile:
    for line in infile:
        print(line)

This will read only one line + some buffered data at a time. If your file is large, this will make a difference.

Upvotes: 2

Related Questions