Asif Iqbal
Asif Iqbal

Reputation: 511

Comparing two text files inside zip files using python

I want to compare two text files with same name and same relative path inside two different zip files using python.

I have been trying to search various ways and found none of the top solutions available work in my case.

My code:

from zipfile import ZipFile
from pathlib import Path

with ZipFile(zip_path1) as z1, ZipFile(zip_path2) as z2:
    file1_paths = [Path(filepath) for filepath in z1.namelist()]
    file12_paths = [Path(filepath) for filepath in z12.namelist()]
    cmn = list(set(file1_paths ).intersection(set(file12_paths )))
    common_files = [filepath for filepath in cmn if str(filepath).endswith(('.txt', '.sh'))]

    for f in common_files:
        with z1.open(f, 'r') as f1, z2.open(f, 'r') as f2:
            if f1.read() != f2.read(): # Also used io.TextIOWrapper(f1).read() here
                print('Difference found for {filepath}'.format(filepath=str(f))

Note:

I have used pathlib for the paths here. In the line with z1.open(f, 'r')... if I use pathlib paths instead of hardcoding the path, I am getting <class 'KeyError'>: "There is no item named WindowsPath('SomeFolder/somefile.txt') in the archive".

Moreover, even if I hardcode the path, the file read buffer that is coming for comparison is always coming empty. So the comparison is not actually working in this case.

I am stuck in this curious case and any help is much appreciated.

Upvotes: 0

Views: 407

Answers (1)

match
match

Reputation: 11060

You should be able to achieve this without using Path, since the paths are specific to the zipfile and don't need to be treated in an os-specific way. The strings returned by namelist() can be used for both comparison and as arguments to open() as follows:

from zipfile import ZipFile

with ZipFile(zip_path1) as z1, ZipFile(zip_path2) as z2:
    common_files = [x for x in set(z1.namelist()).intersection(set(z2.namelist())) if x.endswith('.txt') or x.endswith('.sh')]
    # print(common_files)

    for f in common_files:
        with z1.open(f) as f1, z2.open(f) as f2:
            if f1.read() != f2.read():
                print('Difference found for {filepath}'.format(filepath=str(f)))

Upvotes: 1

Related Questions