Reputation: 173
Given a list of byte ranges that have to be skipped:
skip_ranges = [(1, 3), (5,7)]
and a binary file:
f = open('test', 'rb')
What is the fastest way to return file contents without bytes 1-3 and 5-7 without modifying the original file?
Input (file contents):
012345678
Output:
048
Please note that this question is specifically about (possibly large) binary files, so a generator would be the best.
Upvotes: 2
Views: 1900
Reputation: 36785
You said the file might potentially be huge so I have adapted @juanpa.arrivillaga solution to read the file in chunks and yield the individual chunks as a generator:
def read_ranges(filename, skip_ranges, chunk_size=1024):
with open(filename, 'rb') as f:
prev = -1
for start, stop in skip_ranges:
end = start - prev - 1
# Go to next skip-part in chunk_size steps
while end > chunk_size:
data = f.read(chunk_size)
if not data:
break
yield data
end -= chunk_size
# Read last bit that didn't fit in chunk
yield f.read(end)
# Seek to next skip
f.seek(stop + 1, 0)
prev = stop
else:
# Read remainder of file in chunks
while True:
data = f.read(chunk_size)
if not data:
break
yield data
print list(read_ranges('test', skip_ranges))
Upvotes: 3
Reputation: 96172
This approach should be relatively fast:
ba = bytearray()
with open('test.dat','rb') as f:
prev = -1
for start, stop in skip_ranges:
ba.extend(f.read(start - prev - 1))
f.seek(stop + 1,0)
prev = stop
else:
ba.extend(f.read())
Upvotes: 1