Is there a straightforward way to write to a file open in r+ mode without overwriting existing bytes?

Question

I have a text file test.txt, with the following contents:

Thing 1. string

And I'm creating a python file that will increment the number every time it gets run without affecting the rest of the string, like so.

Run once:

Thing 2. string

Run twice:

Thing 3. string

Run three times:

Thing 4. string

Run four times:

Thing 5. string

This is the code that I'm using to accomplish this.

file = open("test.txt","r+")

started = False
beginning = 0 #start of the digits
done = False
num = 0

#building the number from digits
while not done:
        next = file.read(1)
        if ord(next) in range(48, 58): #ascii values of 0-9
                started = True
                num *= 10
                num += int(next)
        elif started: #has reached the end of the number
                done = True
        else: #has not reached the beginning of the number
                beginning += 1

num += 1
file.seek(beginning,0)
file.write(str(num))

This code works, so long as the number is not 10^n-1 (9, 99, 999, etc) because in those cases, it writes more bytes than were previously in the number. As such, it will override the characters that follow.

So this brings me to the point. I need a way to write to the file that overwrites previously bytes, which I have, and a way to write to the file that does not overwrite previously existing bytes, which I don't have. Does such a mechanism exist in python, and if so, what is it?

I have already tried opening the file using the line file = open("test.txt","a+") instead. When I do that, it always writes to the end, regardless of the seek point.

file = open("test.txt","w+") will not work because I need to keep the contents of the file while altering it, and files opened in any variant of w mode are wiped clean.

I have also thought of solving my problem using a function like this:

#file is assumed to be in r+ mode
def write(string, file, index = -1):
        if index != -1:
                file.seek(index, 0)
        remainder = file.read()
        file.seek(index)
        file.write(remainder + string)

But I also want to be able to expand the solution to larger files, and reading the rest of the file single-handedly changes what I'm trying to accomplish from being O(1) to O(n). It also seems very non-Pythonic, since it seeks to accomplish the task in a less-than-straightforward way.

It would also make my I/O operations inconsistent: I would have class methods (file.read() and file.write()) to read from the file and write to it replacing old characters, but an external function to insert without replacing.

If I make the code inline, rather than a function, it means I have to write several of the same lines of code every time I try to write without replacing, which is also non-Pythonic.

To reiterate my question, is there a more straightforward way to do this, or am I stuck with the function?

zwol · Accepted Answer

Unfortunately, what you want to do is not possible. This is a limitation at a lower level than Python, in the operating system. Neither the Unix nor the Windows file access API offers any way to insert new bytes in the middle of a file without overwriting the bytes that were already there.

Reading the rest of the file and rewriting it is the usual workaround. Actually, the usual workaround is to rewrite the entire file under a new name and then use rename to move it back to the old name. On Unix, this accomplishes an atomic file update - unless the computer crashes, concurrent readers will see either the new file or the old file, not some hybrid. (Windows, sadly, still does not allow you to rename over a name that already exists, so if you use this strategy you have to delete the old file first, opening an unavoidable race window where the file might appear not to exist at all.)

Yes, this is O(N), and yes, if you use the write-new-file-and-rename strategy it temporarily consumes scratch disk space equal to the size of the file (old or new, whichever is larger). That's just how it is.

I haven't thought about it enough to give you even a sketch of the code, but it should be possible to use context managers to wrap up the write-new-file-and-rename approach tidily.

Is there a straightforward way to write to a file open in r+ mode without overwriting existing bytes?

Answers (2)

Related Questions