neandr
neandr

Reputation: 219

Get zip file details - not from its content

After creating a zip file in Python2, how to get the details of the zip? It's not about it's containing files but the zip itself.

On Linux opening the zip file with the 'Archive Manager' the properties can be displayed:

"Last modified, Archive size, Content size, Compression ratio, Number of files"

How to get those properties from within a python script?

Upvotes: 2

Views: 2254

Answers (2)

Ondrej K.
Ondrej K.

Reputation: 9664

This information is not available in the ZIP archive as a single structure to access. I am not sure how Archive Manager implements it and I do not have one around to check it out, but I presume it to be a combination of stat of the archive itself to retrieve the time of its last modification and size. E.g. for archive ar.zip:

os.stat('ar.zip').st_mtime  # last modification of the archive
os.stat('ar.zip').st_size  # size of the archive

And iterating over archive members information for the rest. For ZIP file, this operation should actually not be prohibitively expensive as there is a directory pointing to all entries at the end of the archive, so it does not have to be read it in its entirety.

For instance:

osize = csize = cnt = 0
for item in z.infolist():
    osize += item.file_size
    csize += item.compress_size
    cnt += 1

will give you osize with original (uncompressed) size of all files, csize compressed size in the archive and cnt number of all entries in the archive.

With that, you can get the compression ratio dividing csize by osize with one caveat. Since you mention/flag using python 2.7, do not forget to convert (at least) one of them to float to force result to be float as well: ratio = float(czise) / osize. On Pyton 3 / would produce float in any case.

You can of course wrap all of that into a convenient function you can pass an open zip archive to:

def zip_details(archive_obj):
    archive_info = {'original_size': 0,
                    'compressed_size': 0,
                    'total_entries': 0}
    archive_info['total_size'] = os.fstat(archive_obj.fp.fileno()).st_size
    archive_info['last_change'] = os.fstat(archive_obj.fp.fileno()).st_mtime
    for item in archive_obj.infolist():
        archive_info['original_size'] += item.file_size
        archive_info['compressed_size'] += item.compress_size
        archive_info['total_entries'] += 1
    archive_info['compression_ration'] = float(archive_info['compressed_size']) / archive_info['original_size']
    return archive_info

and get a dictionary with the desired details in return. Or you could subclass zipfile.ZipFile and add this functionality as its method.

You've expressed limitation in the question title to exclude using the content, but I am afraid, that condition is impossible to fulfill for an existing archive except for overall size and time of last modification. Everything else can really only be learned by looking into an archive itself. File count from the directory at its ends and further details from information stored on individual files. This is not python specific and holds for any tool or language used.

Upvotes: 2

neandr
neandr

Reputation: 219

As long as working with 'bash' (like in Linux) here is a simple method to zip a given file/dir list with getting the zip archive properties

import os
bashCommand = "zip -r -v" \
  " " + "./my-extension.zip" \
  " " + "file1 file2 fileN dir1 dir2 dirN" \
  " " + "| grep 'total bytes=' > zip.log"
os.system(bashCommand)

Note: Sure this can be executed directly at the OS prompt, but the intend is to include the call in a bigger python script

Upvotes: -1

Related Questions