Skip some bytes of a file and return content

Question

Given a list of byte ranges that have to be skipped:

skip_ranges = [(1, 3), (5,7)]

and a binary file:

f = open('test', 'rb')

What is the fastest way to return file contents without bytes 1-3 and 5-7 without modifying the original file?

Input (file contents):

012345678

Output:

Please note that this question is specifically about (possibly large) binary files, so a generator would be the best.

Nils Werner · Accepted Answer

You said the file might potentially be huge so I have adapted @juanpa.arrivillaga solution to read the file in chunks and yield the individual chunks as a generator:

def read_ranges(filename, skip_ranges, chunk_size=1024):
    with open(filename, 'rb') as f:
        prev = -1
        for start, stop in skip_ranges:
            end = start - prev - 1

            # Go to next skip-part in chunk_size steps
            while end > chunk_size:
                data = f.read(chunk_size)
                if not data:
                    break
                yield data
                end -= chunk_size

            # Read last bit that didn't fit in chunk
            yield f.read(end)

            # Seek to next skip
            f.seek(stop + 1, 0)
            prev = stop
        else:
            # Read remainder of file in chunks
            while True:
                data = f.read(chunk_size)
                if not data:
                    break
                yield data

print list(read_ranges('test', skip_ranges))

Skip some bytes of a file and return content

Answers (2)

Related Questions