Reputation: 539
Roughly speaking, I'm calling the following function twice in a row:
def _read_bytes(path):
with open(path, "rb") as f:
print(f"f.tell() (should always be 0): {f.tell()}")
s = f.read()
print(f"f.read(): {s}")
print(f"f.tell() (should be length of file): {f.tell()}")
print(f"f.seek(0, 2) (should be length of file): {f.seek(0, 2)}")
return s
We're seeing a flake in CI where running this twice results in the following output:
# first time
f.tell() (should always be 0): 0
f.read(): b'PAR1\x15\x00\x15\x0e......' # 1109 bytes long
f.tell() (should be length of file): 1109
f.seek(0, 2) (should be length of file): 1109
# second time
f.tell() (should always be 0): 0
f.read(): b'PAR1\x15\x00\x15\x0e......' # 10585 bytes long
f.tell() (should be length of file): 10585
f.seek(0, 2) (should be length of file): 10585
The bytes in the first f.read()
contains exactly the first 1109 bytes of the second f.read()
call. The really odd thing about this is that f.seek(0, 2)
returns different numbers each time. When might f.seek(0, 2)
return different values on the same file?
Details:
Upvotes: 0
Views: 207
Reputation: 539
I forgot that I have multiple threads writing to the same file, so another thread might have cleared the file + partially written the file when the current thread is reading the file. facepalm
Solution was to use a file lock to ensure that only one thread writes + reads the file at a time.
Upvotes: 1