Reputation: 219
After creating a zip file in Python2, how to get the details of the zip? It's not about it's containing files but the zip itself.
On Linux opening the zip file with the 'Archive Manager' the properties can be displayed:
"Last modified, Archive size, Content size, Compression ratio, Number of files"
How to get those properties from within a python script?
Upvotes: 2
Views: 2254
Reputation: 9664
This information is not available in the ZIP archive as a single structure to access. I am not sure how Archive Manager implements it and I do not have one around to check it out, but I presume it to be a combination of stat
of the archive itself to retrieve the time of its last modification and size. E.g. for archive ar.zip
:
os.stat('ar.zip').st_mtime # last modification of the archive
os.stat('ar.zip').st_size # size of the archive
And iterating over archive members information for the rest. For ZIP file, this operation should actually not be prohibitively expensive as there is a directory pointing to all entries at the end of the archive, so it does not have to be read it in its entirety.
For instance:
osize = csize = cnt = 0
for item in z.infolist():
osize += item.file_size
csize += item.compress_size
cnt += 1
will give you osize
with original (uncompressed) size of all files, csize
compressed size in the archive and cnt
number of all entries in the archive.
With that, you can get the compression ratio dividing csize
by osize
with one caveat. Since you mention/flag using python 2.7, do not forget to convert (at least) one of them to float
to force result to be float as well: ratio = float(czise) / osize
. On Pyton 3 /
would produce float
in any case.
You can of course wrap all of that into a convenient function you can pass an open zip archive to:
def zip_details(archive_obj):
archive_info = {'original_size': 0,
'compressed_size': 0,
'total_entries': 0}
archive_info['total_size'] = os.fstat(archive_obj.fp.fileno()).st_size
archive_info['last_change'] = os.fstat(archive_obj.fp.fileno()).st_mtime
for item in archive_obj.infolist():
archive_info['original_size'] += item.file_size
archive_info['compressed_size'] += item.compress_size
archive_info['total_entries'] += 1
archive_info['compression_ration'] = float(archive_info['compressed_size']) / archive_info['original_size']
return archive_info
and get a dictionary with the desired details in return. Or you could subclass zipfile.ZipFile
and add this functionality as its method.
You've expressed limitation in the question title to exclude using the content, but I am afraid, that condition is impossible to fulfill for an existing archive except for overall size and time of last modification. Everything else can really only be learned by looking into an archive itself. File count from the directory at its ends and further details from information stored on individual files. This is not python specific and holds for any tool or language used.
Upvotes: 2
Reputation: 219
As long as working with 'bash' (like in Linux) here is a simple method to zip a given file/dir list with getting the zip archive properties
import os
bashCommand = "zip -r -v" \
" " + "./my-extension.zip" \
" " + "file1 file2 fileN dir1 dir2 dirN" \
" " + "| grep 'total bytes=' > zip.log"
os.system(bashCommand)
Note: Sure this can be executed directly at the OS prompt, but the intend is to include the call in a bigger python script
Upvotes: -1