Reputation: 69
At some point in my xml file I have contributor with
<revision>
<id>1</id>
<timestamp>2012-10-25T15:50:18Z</timestamp>
<contributor>
<ip>127.0.0.1</ip>
</contributor>
</revision>
At another point in my xml file have contributor with
<revision>
<id>2</id>
<parentid>1</parentid>
<timestamp>2012-10-26T20:13:56Z</timestamp>
<contributor>
<username>Reedy</username>
<id>2</id>
</contributor>
</revision>
I wrote a python script which will parse through the xml files and return whatever tags we need into a output file. But under my contributor I had two different things Ip and username, id. I would like to ignore Ip and only want to write username and id into my output file. If have both I am getting a key error like KeyError: 'username'
well this is my code
import xmltodict
with open('path to xml file') as xml_file:
dic_xml = xmltodict.parse(xml_file.read())
page = dic_xml['mediawiki']['page']
data = list()
for rev in page['revision']:
my_string = ""
my_string += " " + "username:" + dict(rev['contributor'])['username']
my_string += " " + "userid:" + dict(rev['contributor'])['id']
my_string += "\n"
data.append(my_string)
with open('output', 'w') as writingFile:
for i in data:
writingFile.write(i)
Upvotes: 1
Views: 2783
Reputation: 107652
Simply use the built-in Python xml element tree module, specifically its dom object with tag and text attributes where you can condition by tag name:
First contributor type:
import xml.etree.ElementTree as etree
xmlfile = '''\
<revision>
<id>1</id>
<timestamp>2012-10-25T15:50:18Z</timestamp>
<contributor>
<ip>127.0.0.1</ip>
</contributor>
</revision>'''
dom = etree.fromstring(xmlfile)
data = dom.findall('contributor/*')
with open('output', 'w') as writingFile:
for items in data:
if items.tag != 'ip':
writingFile.write(items.tag + ': ' + items.text + '\n')
# <NOTHING>
Second contributor type:
xmlfile = '''\
<revision>
<id>2</id>
<parentid>1</parentid>
<timestamp>2012-10-26T20:13:56Z</timestamp>
<contributor>
<username>Reedy</username>
<id>2</id>
</contributor>
</revision>'''
dom = etree.fromstring(xmlfile)
data = dom.findall('contributor/*')
with open('output', 'w') as writingFile:
for items in data:
if items.tag != 'ip':
writingFile.write(items.tag + ': ' + items.text + '\n')
# username: Reedy
# id: 2
Upvotes: 1