Reputation: 621
I am using this method to extract this zipfile.
r = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall("Documents_zip") #This is where the error occurs
And I get this error from Python :
BadZipFile: File name in directory '2017-08-29_Cerfa_CpC_Ombi+¿res_Lac_Th+⌐sauque.pdf' and header b'2017-08-29_Cerfa_CpC_Ombi\xc3\xa8res_Lac_Th\xc3\xa9sauque.pdf' differ.
I do not know much about the zipfile module but I found that it is too strict and that it is not necessary to check file name and header.
How can I extract without raising the error ?
EDIT 1 :
I created this function to avoid the error to raise. It just returns a boolean to indicate if whether or not, the zip extraction was run.
def download_zip(z, path):
if not(z.testzip()):
z.extractall(path)
return True
else:
return False
Upvotes: 1
Views: 4528
Reputation: 621
I finalised the previous function.
It extract the files of a zip into a folder namde path
. If there is a problem, the named of the current directory is changed and the number of corrupted files is indicated.
The function also returns this number.
import os
import zipfile
def download_zip(z, path):
names_files = z.namelist()
count = 0
for my_file in names_files:
if my_file:
if z.testzip():
if not(my_file in z.testzip()):
try:
z.extract(my_file, path=path)
except zipfile.BadZipfile:
count = count +1
else:
z.extract(my_file, path=path)
else:
count = count + 1
if count != 0:
my_path = os.getcwd()
parent = os.path.dirname(my_path)
os.chdir(parent)
os.rename(my_path, my_path + ' - ' + str(count) + ' doc du zip non extrait')
os.chdir(my_path + ' - ' + str(count) + ' doc du zip non extrait')
return count
Upvotes: 1