buhtz
buhtz

Reputation: 12162

How to identify a BytesIO object as a zip file (and not an Excel-File) with Python?

I have to deal with BytesIO objects. Some of them are "regular" files but some of them are compressed via ZipFile. I need to identify that.

I was looking into https://en.wikipedia.org/wiki/ZIP_(file_format) but did not understand all details.

One solution could be to check the first 4 bytes of the object

>>> f.getvalue()[:4]
b'PK\x03\x04'

But I am not sure if this is True for all kind of zip file formats.

EDIT: After discussion in the comments the question must be made more precise. I want to know if it is a zip file but not an Excel-File (which are zip files at all).

Upvotes: 1

Views: 923

Answers (1)

GordonAitchJay
GordonAitchJay

Reputation: 4860

This is one way to check if it is a zip file, but not an ooxml file:

for buffer in buffers:
    if zipfile.is_zipfile(buffer):
        with zipfile.ZipFile(buffer) as zip_file:
            try:
                # All ooxml documents contain this file
                zip_file.getinfo(name="[Content_Types].xml")
            except KeyError:
                # It is not an ooxml filelp
                pass
            else:
                # It is an ooxml file
                continue
            
            # Do stuff with the zip file that isn't an ooxml file
            print(zip_file.filename)

Upvotes: 1

Related Questions