Krishna Prasad
Krishna Prasad

Reputation: 69

How to ignore specific tags in xml files?

At some point in my xml file I have contributor with

<revision>
      <id>1</id>
      <timestamp>2012-10-25T15:50:18Z</timestamp>
      <contributor>
        <ip>127.0.0.1</ip>
      </contributor>
</revision>

At another point in my xml file have contributor with

<revision>
      <id>2</id>
      <parentid>1</parentid>
      <timestamp>2012-10-26T20:13:56Z</timestamp>
      <contributor>
        <username>Reedy</username>
        <id>2</id>
      </contributor>
</revision>

I wrote a python script which will parse through the xml files and return whatever tags we need into a output file. But under my contributor I had two different things Ip and username, id. I would like to ignore Ip and only want to write username and id into my output file. If have both I am getting a key error like KeyError: 'username'

well this is my code

import xmltodict
with open('path to xml file') as xml_file:
  dic_xml = xmltodict.parse(xml_file.read())
  page = dic_xml['mediawiki']['page']
  data = list()
  for rev in page['revision']:
      my_string = ""
      my_string += " " + "username:" + dict(rev['contributor'])['username']
      my_string += " " + "userid:" + dict(rev['contributor'])['id']
      my_string += "\n"
      data.append(my_string)

with open('output', 'w') as writingFile:
    for i in data:
        writingFile.write(i)

Upvotes: 1

Views: 2783

Answers (1)

Parfait
Parfait

Reputation: 107652

Simply use the built-in Python xml element tree module, specifically its dom object with tag and text attributes where you can condition by tag name:

First contributor type:

import xml.etree.ElementTree as etree

xmlfile = '''\
<revision>
      <id>1</id>
      <timestamp>2012-10-25T15:50:18Z</timestamp>
      <contributor>
        <ip>127.0.0.1</ip>
      </contributor>
</revision>'''

dom = etree.fromstring(xmlfile)
data = dom.findall('contributor/*')

with open('output', 'w') as writingFile:
    for items in data:
        if items.tag != 'ip':
            writingFile.write(items.tag + ': ' + items.text + '\n')
# <NOTHING>

Second contributor type:

xmlfile = '''\
<revision>
      <id>2</id>
      <parentid>1</parentid>
      <timestamp>2012-10-26T20:13:56Z</timestamp>
      <contributor>
        <username>Reedy</username>
        <id>2</id>
      </contributor>
</revision>'''

dom = etree.fromstring(xmlfile)
data = dom.findall('contributor/*')

with open('output', 'w') as writingFile:
    for items in data:
        if items.tag != 'ip':
            writingFile.write(items.tag + ': ' + items.text + '\n')
# username: Reedy
# id: 2

Upvotes: 1

Related Questions