Serhat
Serhat

Reputation: 638

How to ignore or remove corrupt files in a zip in Python

I have a large zip file (40 GB) and I am trying to extract it. But some files are corrupted. Therefore whenever I try to extractall() I get BadZipfile: Bad CRC-32 for file 'filename.jpg' error. When I use testzip(), I am able to see the first corrupt file. Here is the code I am using:

import zipfile

path_to_zip_file = "data.zip"
directory_to_extract_to = "directory/"

with zipfile.ZipFile(path_to_zip_file, 'r') as zip_ref:
   print(zip_ref.testzip())
   zip_ref.extractall(directory_to_extract_to)

Now my question is, is there a way to ignore or remove the corrupted file so that I continue my unzipping process?

Upvotes: 0

Views: 1352

Answers (1)

Wups
Wups

Reputation: 2569

Extract files one by one, catch possible exceptions and continue with the next file:

import zipfile

with zipfile.ZipFile("data.zip", "r") as zip_ref:
    for name in zip_ref.namelist():
        try:
            zip_ref.extract(name, "directory/")
        except zipfile.BadZipFile as e:
            print(e)

Upvotes: 1

Related Questions