Reputation: 12162
I have to deal with BytesIO
objects. Some of them are "regular" files but some of them are compressed via ZipFile
. I need to identify that.
I was looking into https://en.wikipedia.org/wiki/ZIP_(file_format) but did not understand all details.
One solution could be to check the first 4 bytes of the object
>>> f.getvalue()[:4]
b'PK\x03\x04'
But I am not sure if this is True
for all kind of zip file formats.
EDIT: After discussion in the comments the question must be made more precise. I want to know if it is a zip file but not an Excel-File (which are zip files at all).
Upvotes: 1
Views: 923
Reputation: 4860
This is one way to check if it is a zip file, but not an ooxml file:
for buffer in buffers:
if zipfile.is_zipfile(buffer):
with zipfile.ZipFile(buffer) as zip_file:
try:
# All ooxml documents contain this file
zip_file.getinfo(name="[Content_Types].xml")
except KeyError:
# It is not an ooxml filelp
pass
else:
# It is an ooxml file
continue
# Do stuff with the zip file that isn't an ooxml file
print(zip_file.filename)
Upvotes: 1