Reputation: 638
I have a large zip file (40 GB) and I am trying to extract it. But some files are corrupted. Therefore whenever I try to extractall()
I get BadZipfile: Bad CRC-32 for file 'filename.jpg'
error. When I use testzip()
, I am able to see the first corrupt file.
Here is the code I am using:
import zipfile
path_to_zip_file = "data.zip"
directory_to_extract_to = "directory/"
with zipfile.ZipFile(path_to_zip_file, 'r') as zip_ref:
print(zip_ref.testzip())
zip_ref.extractall(directory_to_extract_to)
Now my question is, is there a way to ignore or remove the corrupted file so that I continue my unzipping process?
Upvotes: 0
Views: 1352
Reputation: 2569
Extract files one by one, catch possible exceptions and continue with the next file:
import zipfile
with zipfile.ZipFile("data.zip", "r") as zip_ref:
for name in zip_ref.namelist():
try:
zip_ref.extract(name, "directory/")
except zipfile.BadZipFile as e:
print(e)
Upvotes: 1