André Fernandes
André Fernandes

Reputation: 2585

Python lxml: how to fetch XML tag names with xpath selector?

I'm trying to parse the following XML using Python and lxml:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/bind9.xsl"?>
<isc version="1.0">
  <bind>
    <statistics version="2.2">
      <memory>
        <summary>
          <TotalUse>1232952256
          </TotalUse>
          <InUse>835252452
          </InUse>
          <BlockSize>598212608
          </BlockSize>
          <ContextSize>52670016
          </ContextSize>
          <Lost>0
          </Lost>
        </summary>
      </memory>
    </statistics>
  </bind>
</isc>

The goal is to extract the tag name and text of every element under bind/statistics/memory/summary in order to produce the following mapping:

TotalUse: 1232952256
InUse: 835252452
BlockSize: 598212608
ContextSize: 52670016
Lost: 0

I've managed to extract the element values, but I can't figure out the xpath expression to get the element tag names.

A sample script:

from lxml import etree as et

def main():

    xmlfile = "bind982.xml"
    location = "bind/statistics/memory/summary/*"
    label_selector = "??????" ## what to put here...?
    value_selector = "text()"

    with open(xmlfile, "r") as data:
        xmldata = et.parse(data)

        etree = xmldata.getroot()

        statlist = etree.xpath(location)

        for stat in statlist:
            label = stat.xpath(label_selector)[0]
            value = stat.xpath(value_selector)[0]
            print "{0}: {1}".format(label, value)

if __name__ == '__main__':
    main()

I know I could use value = stat.tag instead of stat.xpath(), but the script must be sufficiently generic to also process other pieces of XML where the label selector is different.

What xpath selector would return an element's tag name?

Upvotes: 1

Views: 4600

Answers (2)

Parfait
Parfait

Reputation: 107587

Simply use XPath's name(), and remove the zero index since this returns a string and not list.

from lxml import etree as et

def main():

    xmlfile = "ExtractXPathTagName.xml"
    location = "bind/statistics/memory/summary/*"
    label_selector = "name()"                         ## what to put here...?
    value_selector = "text()"

    with open(xmlfile, "r") as data:
        xmldata = et.parse(data)

        etree = xmldata.getroot()

        statlist = etree.xpath(location)

        for stat in statlist:
            label = stat.xpath(label_selector)
            value = stat.xpath(value_selector)[0]
            print("{0}: {1}".format(label, value).strip())

if __name__ == '__main__':
    main()

Output

TotalUse: 1232952256    
InUse: 835252452    
BlockSize: 598212608    
ContextSize: 52670016    
Lost: 0

Upvotes: 2

Martin Honnen
Martin Honnen

Reputation: 167516

I think you don't need XPath for the two values, the element nodes have properties tag and text so use for instance a list comprehension:

[(element.tag, element.text) for element in etree.xpath(location)]

Or if you really want to use XPath

result = [(element.xpath('name()'), element.xpath('string()')) for element in etree.xpath(location)]

You could of course also construct a list of dictionaries:

result = [{ element.tag : element.text } for element in root.xpath(location)]

or

result = [{ element.xpath('name()') : element.xpath('string()') } for element in etree.xpath(location)]

Upvotes: 0

Related Questions