Bruce Banner
Bruce Banner

Reputation: 331

Extract name, value from XML file using xml.etree.ElementTree

I am having trouble extracting (name, value) pair from XML where name == 'mykey', using Python's xml.etree.ElementTree library. The pseudo-code I want is:

if (name.text == 'mykey'):
   print value.text

Value would be the value of mykey which in this example case is XX111.

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE charles-session SYSTEM "https://www.charlesproxy.com/dtd/charles-session-1_2.dtd">

<charles-session>
<header>
<name>Content-Length</name>
<value>10804</value>
</header>
<header>
<name>Date</name>
<value>Wed, 13 Oct 2021 22:02:42 GMT</value>
</header>
<header>
<name>mykey</name>
<value>XX111</value>
</header>
<header>
<name>Accept-Language</name>
<value>en-US;q=1.0, el-US;q=0.9</value>
</header>

I can get the 'mykey' text printed, but I don't know how to say "now give me the text of the value right after 'mykey'.

for name in root_node.iter('name'):
    if re.match(r'mykey', name.text):
        print(name.text)

Upvotes: 0

Views: 950

Answers (1)

simpleApp
simpleApp

Reputation: 3158

Get the element value and line number this element is at.

Please look at the comment on code for explanation:

# Data setup
from io import StringIO
xml_data="""\
<?xml version="1.0" encoding="UTF-8"?>
<charles-session>
    <header>
    <name>Content-Length</name>
    <value>10804</value>
    </header>
    <header>
    <name>Date</name>
    <value>Wed, 13 Oct 2021 22:02:42 GMT</value>
    </header>
    <header>
    <name>mykey</name>
    <value>XX111</value>
    </header>
    <header>
    <name>Accept-Language</name>
    <value>en-US;q=1.0, el-US;q=0.9</value>
    </header>
</charles-session>
"""

now get the element looking for and its value.

# define the root
root = ET.parse(StringIO(xml_data)).getroot()
# based on xpath get the key and its value
for each in root.findall(".//header"):
    if "mykey" in each[0].text:
        print(each[1].text)

tweaked the code from get line number credit @Duncan Harris

import sys
sys.modules['_elementtree'] = None
import xml.etree.ElementTree as ET

class LineNumberingParser(ET.XMLParser):
    def _start(self, *args, **kwargs):
        # Here we assume the default XML parser which is expat
        # and copy its element position attributes into output Elements
        element = super(self.__class__, self)._start(*args, **kwargs)# change for python 3
        element._start_line_number = self.parser.CurrentLineNumber
        element._start_column_number = self.parser.CurrentColumnNumber
        element._start_byte_index = self.parser.CurrentByteIndex
        return element

    def _end(self, *args, **kwargs):
        element = super(self.__class__, self)._end(*args, **kwargs)
        element._end_line_number = self.parser.CurrentLineNumber
        element._end_column_number = self.parser.CurrentColumnNumber
        element._end_byte_index = self.parser.CurrentByteIndex
        return element

tree = ET.parse('test.xml', parser=LineNumberingParser())

now track the line number: i.e line number of root + element looking for.

tree_line_number=tree.getroot()._start_line_number
for each in tree.findall(".//header"):
    if "mykey" in each[0].text:
        print(f"{each[1].text} at line number -> {each._start_line_number+tree_line_number}")

enter image description here

Upvotes: 1

Related Questions