Romeo
Romeo

Reputation: 147

python 2.7: accessing comment in XML

I been using ElementTree to read XML file, and were able to parse XML properly. But I don't know how to read comment, especially where child context is important. In this specific case, I like to read comment for NY, that air, bus and trail is avilable and store it in a dictionary (name:comment).

<spirit: st>
     .....   
     <spirit:fa>
            <spirit:name>NY</spirit:name>
            <spirit:den>3</spirit:bitWidth>
            <spirit:metro>true</spirit:metro>
            <!-- air, bus, train all available -->
            <spirit:access>air</spirit:access>
         </spirit:fa>
      .....

My code:

for state in data.findall('spirit:st', IPXACT_MAP):
    for city in state.findall('spirit:fa', IPXACT_MAP):
        access = city.find('spirit:access', IPXACT_MAP) 
        #read comment and set city_access_d[city.text] = comment

Upvotes: 0

Views: 62

Answers (1)

Daniel Haley
Daniel Haley

Reputation: 52858

If you can use lxml, you should be able to select the comment() with XPath.

Here's an example. I've removed the namespace prefixes to simplify it.

from lxml import etree

xml = """
<st>
    <fa>
        <name>NY</name>
        <den>3</den>
        <!-- ignore me -->
        <metro>true</metro>
        <!-- air, bus, train all available -->
        <access>air</access>
    </fa>
</st>
"""

parser = etree.XMLParser(remove_blank_text=True)
tree = etree.fromstring(xml, parser=parser)

city_access_d = {}
for city in tree.xpath(".//fa"):
    name = city.xpath("name")[0].text
    comment = city.xpath("comment()[following-sibling::node()[1][self::access]]")[0]
    city_access_d[name] = comment.text.strip()

print city_access_d

printed output...

{'NY': 'air, bus, train all available'}

You could also use the following XPath if for some reason you didn't want to create the XMLParser...

comment = city.xpath("comment()[following-sibling::node()[not(self::text())][1][self::access]]")[0]

Upvotes: 1

Related Questions