JeffThompson
JeffThompson

Reputation: 1600

Python read file by bytes until sequence of bytes

How can I read a file in Python byte-by-byte until a specific sequence of bytes is reached?

This must happen all the time with libraries that read specific kinds of files to parse the header, scan for parameters, etc.

As an example: I'm reading through the PNG spec and see that pixel data starts after the byte sequence IDAT.

I can read the file like this:

with open('image.png', 'rb') as f:
    byte = f.read(1)
    while byte != '':
        byte = f.read(1)

But since I'm only reading one byte at a time, I can't watch for IDAT directly (since I'd only get the I but not the other three bytes). I can't read the file by chunks of four bytes because it won't always line up correctly.

I can imagine keeping track of the last four bytes but thought perhaps there was a more elegant way?

Upvotes: 1

Views: 2444

Answers (2)

chepner
chepner

Reputation: 531490

Use mmap and treat the file like a giant string.

import mmap

with open('image.png', 'rb') as f:
    with mmap.mmap(f.fileno(), 0, mmap.PROT_READ) as mf:
        offset = mf.find(b'IDAT')
        if offset == -1:
            raise Exception("IDAT not found")
    f.seek(offset)

Upvotes: 0

kkawabat
kkawabat

Reputation: 1677

If you aren't married to the idea of going byte by byte, you can read the data in one long string then split it by occurrences of IDAT.

with open('image.png', 'rb') as f:
    lines = f.readlines()
    combined_line = b''.join(lines)
    IDAT_splited = combined_line.split(b'IDAT')[1:]

Upvotes: 3

Related Questions