Jonathan Wallace
Jonathan Wallace

Reputation: 31

Blocking file read in python

I need to read binary data structures as they are flushed to a file in Windows. I don't have control over the program that writes the data, it's a black-box LP model that always writes to a few hard-coded filenames, but I do know it flushes it's output periodically. I would like to read this data as its written, from several files at once. I don't have a problem forking a thread for each file, but it would be really convenient if I could use read(n) and have it block until it reads an entire n bytes, or readinto(d) and have it block until the buffer is full. Is this possible to do in Python on Windows?

I'm having a tough time searching for this because all anyone ever talks about is non-blocking and how to do it. But with this solution I intend to let the children block and report data back via a queue to a parent that doesn't block.

If there isn't a way to get the blocking reads, is there a way to avoid busy waiting or sleep()ing?

Upvotes: 3

Views: 4806

Answers (1)

user559633
user559633

Reputation:

When reading a file in Python, that thread is blocking all other threads from doing work in the same process by default -- you get this for free due to the GIL.

Whereas you're reading from binary data as it hits the disk, you could read N bytes of data and compare against your desired size, looping as required, and returning when done.

e.g. as some runnable pseudo-code (that you should not implement as is):

my_file = open('/Users/tfisher/sputnik.m4a', 'rb')
megabyte_in_bytes = 1000000

def chunk_reader(file=my_file, chunk_size=megabyte_in_bytes):
    filesize = 0
    _return_chunk = bytearray()

    while filesize < chunk_size:

        print("Reading file. Current size: {0}".format(sys.getsizeof(_return_chunk)))

        # reading will keep seeking forward until the file is 
        # seek(0) or otherwise opened from the start
        _return_chunk = _return_chunk + file.read(10)
        filesize = filesize + sys.getsizeof(_return_chunk)

    return _return_chunk

print(chunk_reader())

If you don't want a busy wait loop, you can check if the file is locked in other threads within the same process by making use of synchronization primitives like semaphores or by creating a file reading class that increments a lock value when starting to read().

Upvotes: 1

Related Questions