irom
irom

Reputation: 3596

extract text between xml tags in python

I have xml string below and trying to print text between tags domain, receive_time , serial and seqno for each entry tag.

xml="""
<response status="success" code="19"><result><msg><line>query job enqueued with jobid 19032</line></msg><job>19032</job></result></response>
19032
<response status="success"><result>
  <job>
    <tenq>14:10:09</tenq>
    <tdeq>14:10:09</tdeq>
    <tlast>19:00:00</tlast>
    <status>ACT</status>
    <id>19032</id>
    <cached-logs>64</cached-logs>
  </job>
  <log>
    <logs count="20" progress="29">
      <entry logid="2473601">
        <domain>1</domain>
        <receive_time>2017/11/26 14:10:08</receive_time>
        <serial>007901004140</serial>
        <seqno>10156449120</seqno>
      </entry>
      <entry logid="2473601">
        <domain>1</domain>
        <receive_time>2017/11/26 14:10:08</receive_time>
        <serial>007901004140</serial>
        <seqno>10156449120</seqno>
      </entry>
      </logs>
  </log>
</result></response>
"""

using xml.etree.ElementTree. To get what's between domain tag I was trying node.attrib.get('domain') or node.get('domain')..please advise

import xml.etree.ElementTree as ET
tree = ET.fromstring(xml)
for node in tree.iter('entry'):
        print node

It can be other python library too, does not have to be xml.etree. I do not want to print text between tags blindly, I need to print tag name followed by text so i.e.:

domain: 1
receive_time: 2017/11/26 14:10:08
serial: 007901004140
seqno: 10156449120

etc

Upvotes: 5

Views: 22575

Answers (2)

Vivek Kalyanarangan
Vivek Kalyanarangan

Reputation: 9081

You find the domain tag using the find() method first. Then, the tag attribute and the text attribute should fetch the details you are looking for -

import xml.etree.ElementTree as ET
tree = ET.fromstring(xml)
for node in tree.iter('entry'):
    print('\n')
    for elem in node.iter():
        if not elem.tag==node.tag:
            print("{}: {}".format(elem.tag, elem.text))

Hope this helps!

Output -

domain: 1
receive_time: 2017/11/26 14:10:08
serial: 007901004140
seqno: 10156449120


domain: 1
receive_time: 2017/11/26 14:10:08
serial: 007901004140
seqno: 10156449120

Upvotes: 11

G.Vitelli
G.Vitelli

Reputation: 1287

You can use SAX Streams to get the inner text content of the xml element. SAX is the better way to parse xml without reading the whole XML into the memory aka DOM Python SAX

Upvotes: 2

Related Questions