Kundan
Kundan

Reputation: 11

how to deal with key-value style tags in xml with python

Using the following XML file:

<?xml version="1.0" encoding="UTF-8"?>
<Environment
     xmlns="http://schemas.dmtf.org/ovf/environment/1"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xmlns:oe="http://schemas.dmtf.org/ovf/environment/1"
     xmlns:ve="http://www.vmware.com/schema/ovfenv"
     oe:id=""
     ve:vCenterId="vm-61">
   <PlatformSection>
      <Kind>VMware ESXi</Kind>
      <Version>5.5.0</Version>
      <Vendor>VMware, Inc.</Vendor>
      <Locale>en</Locale>
   </PlatformSection>
   <PropertySection>
         <Property oe:key="ppEnv" oe:value="production"/>
         <Property oe:key="pphostname" oe:value="coolhostname"/>
   </PropertySection>
   <ve:EthernetAdapterSection>
      <ve:Adapter ve:mac="00:50:56:94:9a:56" ve:network="Service" ve:unitNumber="7"/>
   </ve:EthernetAdapterSection>
</Environment>

I would like to get the value of oe:key "pphostname" but I could not find a clear way of achieving this.

I'm new to python and xml, and all I tried is in python was:

>> import libxml2
>>> doc = libxml2.parseFile("test.xml")
>>> doc.xpathEval("//Property/*")
[]
>>> doc.xpathEval("//Property/@*")
[]
>>> doc.xpathEval("//Property")
[]
>>> doc.xpathEval("//*")
[<xmlNode (Environment) object at 0x7fb551e8e320>, <xmlNode (PlatformSection) object at 0x7fb551eb3a28>, <xmlNode (Kind) object at 0x7fb551daa950>, <xmlNode (Version) object at 0x7fb551daa998>, <xmlNode (Vendor) object at 0x7fb551daa9e0>, <xmlNode (Locale) object at 0x7fb551daaa28>, <xmlNode (PropertySection) object at 0x7fb551daaa70>, <xmlNode (Property) object at 0x7fb551daaab8>, <xmlNode (Property) object at 0x7fb551daab00>, <xmlNode (EthernetAdapterSection) object at 0x7fb551daab48>, <xmlNode (Adapter) object at 0x7fb551daab90>]
>>> doc.xpathEval("/Environment/PropertySection/Property[1]")
[]
>>> doc.xpathEval("/Environment/PropertySection/Property/oe:key")
Undefined namespace prefix

I'm more familiar with bash, however I don't like to parse using bash utilities.

Upvotes: 1

Views: 1664

Answers (3)

lmatt
lmatt

Reputation: 197

Check this document(section 6.2: namespace defaulting). In your xml, there is a default namespace(xmlns="http://schemas.dmtf.org/ovf/environment/1"). So i think we need to add default namespace in the xpath. Below is the test code, with lxml libary(libxml2 should be similar).

from lxml import etree
from StringIO import StringIO

s = '''<?xml version="1.0" encoding="UTF-8"?>
<Environment
     xmlns="http://schemas.dmtf.org/ovf/environment/1"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xmlns:oe="http://schemas.dmtf.org/ovf/environment/1"
     xmlns:ve="http://www.vmware.com/schema/ovfenv"
     oe:id=""
     ve:vCenterId="vm-61">
   <PlatformSection>
      <Kind>VMware ESXi</Kind>
      <Version>5.5.0</Version>
      <Vendor>VMware, Inc.</Vendor>
      <Locale>en</Locale>
   </PlatformSection>
   <PropertySection>
         <Property oe:key="ppEnv" oe:value="production"/>
         <Property oe:key="pphostname" oe:value="coolhostname"/>
   </PropertySection>
   <ve:EthernetAdapterSection>
      <ve:Adapter ve:mac="00:50:56:94:9a:56" ve:network="Service" ve:unitNumber="7"/>
   </ve:EthernetAdapterSection>
</Environment>'''

f = StringIO(s)
tree = etree.parse(f)

namespaces={'oe': 'http://schemas.dmtf.org/ovf/environment/1', 'xsi': 'http://www.w3.org/2001/XMLSchema-instance', 've': 'http://www.vmware.com/schema/ovfenv'}   

print tree.xpath('//oe:Property[@oe:key="pphostname"]/@oe:value', namespaces=namespaces)
#output ['coolhostname']

Upvotes: 0

avenet
avenet

Reputation: 3043

Try using xml.dom.minidom:

from xml.dom import minidom

xml_doc = minidom.parse('test.xml')
property_items = xml_doc.getElementsByTagName("Property")

condition = lambda x: x.hasAttribute('oe:key') and 
                      x.attributes['oe:key'].value == "pphostname"

matched_elements = [x for x in property_items if condition(x)]

if matched_elements:
    matched_element = matched_elements[0]
    print matched_element.attributes['oe:value'].value

Upvotes: 1

Anzel
Anzel

Reputation: 20553

You can just assign a name for your namespace (oe) and match the key-value pair from their attributes.

Sample here that I use xml module:

import xml.etree.ElementTree as ET

s = '''<?xml version="1.0" encoding="UTF-8"?>
<Environment
     xmlns="http://schemas.dmtf.org/ovf/environment/1"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xmlns:oe="http://schemas.dmtf.org/ovf/environment/1"
     xmlns:ve="http://www.vmware.com/schema/ovfenv"
     oe:id=""
     ve:vCenterId="vm-61">
   <PlatformSection>
      <Kind>VMware ESXi</Kind>
      <Version>5.5.0</Version>
      <Vendor>VMware, Inc.</Vendor>
      <Locale>en</Locale>
   </PlatformSection>
   <PropertySection>
         <Property oe:key="ppEnv" oe:value="production"/>
         <Property oe:key="pphostname" oe:value="coolhostname"/>
   </PropertySection>
   <ve:EthernetAdapterSection>
      <ve:Adapter ve:mac="00:50:56:94:9a:56" ve:network="Service" ve:unitNumber="7"/>
   </ve:EthernetAdapterSection>
</Environment>'''

tree = ET.fromstring(s)
oe = '{http://schemas.dmtf.org/ovf/environment/1}'

for node in tree.iter(oe+'Property'):
    if node.attrib[oe+'key'] == 'pphostname':
        print node.attrib[oe+'value']

result:

coolhostname

Upvotes: 1

Related Questions