Niek
Niek

Reputation: 51

How to access an XML within a zip

I have the following structure: folder/some_files.zip/files.xml

I can retrieve the info I want out of the XML the following way. This works for normal folders but I haven't been able to make it work with Zip files.

tree = ET.parse('filename.xml')
root = tree.getroot()
return root.find('.//Cloud_Coverage_Assessment').text

How can I make this work for multiple zips? This is my current code:

def RetrieveCloudCover(filename, root):
    for zip in folder:  
        unzipped_file = zipfile.ZipFile(filename, "r")
        tree = ET.parse(unzipped_file)
        root = tree.getroot()
        return root.find('.//Cloud_Coverage_Assessment').text

Right now I get the error:

FileNotFoundError: [Errno 2] No such file or directory: 'filename'

Upvotes: 0

Views: 1730

Answers (1)

Ssayan
Ssayan

Reputation: 1043

You are mixing a lot of different things in your function. A zip file is not a folder you can look into in a "normal" way although it is what it looks like in an OS explorer.

So you need to unzip your zipfile in a folder and then iterate through the XML file. Something like this:

import zipfile
import glob
import os

zip_filename = "lvis_v1_train.json.zip"
unzip_folder = "unzipped_files"
# Open the zip file and extract the XML file to the "unzip_folder"
with zipfile.ZipFile(zip_filename) as zip_file:
    zip_file.extractall(unzip_folder)

# Get the list of XML files that were in your zip file
list_xml_files = glob.glob(os.path.join(unzip_folder, "*.json"))
# Iterate through the list of XML paths and open them.
for xml_file in list_xml_files:
    tree = ET.parse('filename.xml')
    # do what you need with xml files

Upvotes: 1

Related Questions