brandonchinn178
brandonchinn178

Reputation: 539

Python file.read() doesn't return full contents (Flaky)

Roughly speaking, I'm calling the following function twice in a row:

def _read_bytes(path):
    with open(path, "rb") as f:
        print(f"f.tell() (should always be 0): {f.tell()}")
        s = f.read()
        print(f"f.read(): {s}")
        print(f"f.tell() (should be length of file): {f.tell()}")
        print(f"f.seek(0, 2) (should be length of file): {f.seek(0, 2)}")
        return s

We're seeing a flake in CI where running this twice results in the following output:

# first time
f.tell() (should always be 0): 0
f.read(): b'PAR1\x15\x00\x15\x0e......' # 1109 bytes long
f.tell() (should be length of file): 1109
f.seek(0, 2) (should be length of file): 1109

# second time
f.tell() (should always be 0): 0
f.read(): b'PAR1\x15\x00\x15\x0e......' # 10585 bytes long
f.tell() (should be length of file): 10585
f.seek(0, 2) (should be length of file): 10585

The bytes in the first f.read() contains exactly the first 1109 bytes of the second f.read() call. The really odd thing about this is that f.seek(0, 2) returns different numbers each time. When might f.seek(0, 2) return different values on the same file?

Details:

Upvotes: 0

Views: 207

Answers (1)

brandonchinn178
brandonchinn178

Reputation: 539

I forgot that I have multiple threads writing to the same file, so another thread might have cleared the file + partially written the file when the current thread is reading the file. facepalm

Solution was to use a file lock to ensure that only one thread writes + reads the file at a time.

Upvotes: 1

Related Questions