Reputation: 35
I have a list of 3 million html files in a zipfile. I would like to extract ~4000 html files from the entire list of files. Is there a way to extract a specific file without unzipping the entire zipfile using Python?
Any leads would be appreciated! Thanks in advance.
Edit:My bad, I should have elaborated on the question. I have a list of all the html filenames that need to be extracted but they are spread out over 12 zipfiles. How do I iterate through each zipfile, extract the matched html file and get the final list of extracted html files?
Upvotes: 0
Views: 718
Reputation: 2546
Let's say you wish to extract all the html
files, then you can this out. If you have the list of all the file names to be extracted, then this will require slight modification.
listOfZipFiles = ['sample1.zip', 'sample2.zip', 'sample1.zip',... , 'sample12.zip' ]
fileNamesToBeExtracted = ['file1.html', 'file2.html', ... 'filen.html']
# Create a ZipFile Object and load sample.zip in it
for zipFileName in listOfZipFiles:
with ZipFile(zipFileName, 'r') as zipObj:
# Get a list of all archived file names from the zip
listOfFileNames = zipObj.namelist()
# Iterate over the file names
for fileName in listOfFileNames:
# Check if file to be extracted is present in file names to be extracted
if fileName in fileNamesToBeExtracted:
# Extract a single file from zip
zipObj.extract(fileName)
Upvotes: 3