Reputation: 31
I need to read binary data structures as they are flushed to a file in Windows. I don't have control over the program that writes the data, it's a black-box LP model that always writes to a few hard-coded filenames, but I do know it flushes it's output periodically. I would like to read this data as its written, from several files at once. I don't have a problem forking a thread for each file, but it would be really convenient if I could use read(n) and have it block until it reads an entire n bytes, or readinto(d) and have it block until the buffer is full. Is this possible to do in Python on Windows?
I'm having a tough time searching for this because all anyone ever talks about is non-blocking and how to do it. But with this solution I intend to let the children block and report data back via a queue to a parent that doesn't block.
If there isn't a way to get the blocking reads, is there a way to avoid busy waiting or sleep()ing?
Upvotes: 3
Views: 4806
Reputation:
When reading a file in Python, that thread is blocking all other threads from doing work in the same process by default -- you get this for free due to the GIL.
Whereas you're reading from binary data as it hits the disk, you could read N bytes of data and compare against your desired size, looping as required, and return
ing when done.
e.g. as some runnable pseudo-code (that you should not implement as is):
my_file = open('/Users/tfisher/sputnik.m4a', 'rb')
megabyte_in_bytes = 1000000
def chunk_reader(file=my_file, chunk_size=megabyte_in_bytes):
filesize = 0
_return_chunk = bytearray()
while filesize < chunk_size:
print("Reading file. Current size: {0}".format(sys.getsizeof(_return_chunk)))
# reading will keep seeking forward until the file is
# seek(0) or otherwise opened from the start
_return_chunk = _return_chunk + file.read(10)
filesize = filesize + sys.getsizeof(_return_chunk)
return _return_chunk
print(chunk_reader())
If you don't want a busy wait loop, you can check if the file is locked in other threads within the same process by making use of synchronization primitives like semaphores or by creating a file reading class that increments a lock value when starting to read()
.
Upvotes: 1