Reputation: 123
I'm trying to read big text file contains 6000 lines same length. File can be accessed through different process and I acquire mutex to prevent race conditions. But output contains partially read lines:
58 '444444444444444444444444444444444444444444444444444444444\n'
58 '333333333333333333333333333333333333333333333333333333333\n'
46 '444444444444444444444444442222222222222222222\n'
58 '444444444444444444444444444444444444444444444444444444444\n'
Code I m trying to run:
import multiprocessing as mp
class Loader:
def __init__(self, path):
self.lock = mp.Lock()
self.file = open(path, 'r')
def read(self):
with self.lock:
try:
line = next(self.file)
print(len(line), repr(line))
except StopIteration:
return False
return True
def worker(loader):
while loader.read():
pass
if __name__ == '__main__':
loader = Loader('./data.txt')
workers = []
for i in range(4):
w = mp.Process(target=worker, args=(loader,))
w.daemon = True
w.start()
workers.append(w)
for w in workers:
w.join()
Firstly, I expect that I will get error when copying file
descriptor to another process, but program started and all process can read from this file.
But race conditions discourage me, why each process do not read whole line?
Upvotes: 4
Views: 1221
Reputation: 40013
You don’t fail to copy a file object because you don’t copy anything (in the usual sense). You’re using the (Unix) default fork
technique and so each process inherits (a copy-on-write version of) the same open file.
So (as VPfB pointed out) each process does its own buffering, but the underlying open file description is shared and that contains the file offset. If you read a character before starting the processes, you’ll see that they all report the same prefix of the file before diverging (and mixing up lines).
Upvotes: 1