Vdimir
Vdimir

Reputation: 123

Why I got race condition while reading file from different process?

I'm trying to read big text file contains 6000 lines same length. File can be accessed through different process and I acquire mutex to prevent race conditions. But output contains partially read lines:

58 '444444444444444444444444444444444444444444444444444444444\n'
58 '333333333333333333333333333333333333333333333333333333333\n'
46 '444444444444444444444444442222222222222222222\n'
58 '444444444444444444444444444444444444444444444444444444444\n'

Code I m trying to run:

import multiprocessing as mp

class Loader:
    def __init__(self, path):
        self.lock = mp.Lock()
        self.file = open(path, 'r')

    def read(self):
        with self.lock:
            try:
                line = next(self.file)
                print(len(line), repr(line))
            except StopIteration:
                return False
        return True


def worker(loader):
    while loader.read():
        pass

if __name__ == '__main__':
    loader = Loader('./data.txt')

    workers = []
    for i in range(4):
        w = mp.Process(target=worker, args=(loader,))
        w.daemon = True
        w.start()
        workers.append(w)

    for w in workers:
        w.join()

Firstly, I expect that I will get error when copying file descriptor to another process, but program started and all process can read from this file. But race conditions discourage me, why each process do not read whole line?

Upvotes: 4

Views: 1221

Answers (1)

Davis Herring
Davis Herring

Reputation: 40013

You don’t fail to copy a file object because you don’t copy anything (in the usual sense). You’re using the (Unix) default fork technique and so each process inherits (a copy-on-write version of) the same open file.

So (as VPfB pointed out) each process does its own buffering, but the underlying open file description is shared and that contains the file offset. If you read a character before starting the processes, you’ll see that they all report the same prefix of the file before diverging (and mixing up lines).

Upvotes: 1

Related Questions