edyvedy13
edyvedy13

Reputation: 2296

Counting number of files with different formats in zip file using Python

I have lot of zipped files in my directory, I want to get number of files for each zip file. For instance, let's say for the zip file "nature.zip", I want to get following output:

file_name   file_format
nature      jpg 2, png 1 

So far I managed to print the content but don't know how could I move forward

    from zipfile import ZipFile
    import os
    directory = r"C:\Users\Lenovo\data_2"
    for folder, subfolders, files in os.walk(directory):
        for file in files:
            if file.endswith(".zip"):
                # opening the zip file in READ mode
                with ZipFile(directory+ '/'+ file, 'r') as zip:
                    # printing all the contents of the zip file
                    zip.printdir()

Thank you very much

Upvotes: 2

Views: 2393

Answers (1)

SigmaPiEpsilon
SigmaPiEpsilon

Reputation: 698

Here is an example. This groups the files inside zips by the extension in a dictionary and prints the output. Adapt this as needed for your case.

#Filegroup.py
from zipfile import ZipFile
from glob import glob

print "file_name","\t","file_format"
for zips in glob('*.zip'):
    with ZipFile(zips) as zip:
        files = zip.namelist()
        filecounts = {}
        for file in files:
            ext = file.split('.')[-1]
            if ext in filecounts:
                filecounts[ext] += 1
            else:
                filecounts[ext] = 1
        print zip.filename,'\t\t',', '.join([' '.join(map(str,elem)) for elem in filecounts.items()])

Test:

$ zipinfo -1 A.zip
a.txt
b.txt
c.jpg
k.png
$ zipinfo -1 B.zip                                                        
g.md
h.txt
e.png
f.png
d.jpg
$ python Filegroup.py 
file_name       file_format
A.zip           txt 2, png 1, jpg 1
B.zip           md 1, txt 1, jpg 1, png 2

Upvotes: 3

Related Questions