Reputation: 13
[Python 3.4][Windows 7]
If there is any easy way to get a whole .xml file like a .txt as one string, that would be enough, but to describe the problem precisely:
This is the first time for me to deal with a .xml file. I have a .xml file containing mainly dictionaries (of further dictionaries). It also says now, i want to get very certain keys and values out of the dictionaries and write them in a .txt file, so therefore a dict (or sth else) in python would be enough.
To make an example:
This is the xml file (library.xml):
<?xml version="1.0" encoding="UTF-8"?>
<plist version="1.0">
<dict>
<key>Version<\key><integer>1</integer>
<key>Tracks</key>
<dict>
<key>0001</key>
<dict>
<key>Name</key><string>spam</string>
<key>Detail</key><string>spam spam</string>
</dict>
<key>0002</key>
<dict>
<key>Name</key><string>ham</string>
<key>Detail</key><string>ham ham</string>
</dict>
</dict>
</dict>
</plist>
I researched and thought i can do it with the xml.etree.ElementTree module: So if i try this:
tree = ET.parse('library.xml')
root = tree.getroot()
I only get this message:
(Unicode Error) 'unicodeescape' codec can't decode bytes…
What I want is obviously some kind of this (or as a dict, it doesnt matter)
[['Name: spam', 'Detail: spam spam'], ['Name: ham', 'Detail: ham ham']
EDIT: xml code was incorrect, sry EDIT: Added last paragraph
Upvotes: 0
Views: 1296
Reputation: 13
i just wanted to let u know that i've just solved it this way:
with open('library.xml',
'r', encoding='UTF-8') as file:
(and some regular expression to get the dicts i want)
this is probably very inefficient since i read the complete file as text but actually i dont care about efficiency, because the function has only one call in my program ;)
Upvotes: 0
Reputation: 19760
The Python standard library contains a module that reads plist files: plistlib
. You can use it to solve your problem with an import and one command:
import plistlib
print plistlib.readPlist('library.xml')
Output:
{'Tracks': {'0001': {'Detail': 'spam spam', 'Name': 'spam'},
'0002': {'Detail': 'ham ham', 'Name': 'ham'}},
'Version': 1}
Upvotes: 1
Reputation: 10213
Update input content from <\key>
to </key>
and removed dict
tag because key is not define for that.
lxml.html
module.dict
tag by xpath()
method.XMLtoDict()
function.getchildren()
method and for
loop.if
loop.getnext()
method.integer
tag then get value type int
.string
tag then value type is string
.dict
tag then value type is dict
and call function again i.e. recursive call.code:
data = """<?xml version="1.0" encoding="UTF-8"?>
<plist version="1.0">
<dict>
<key>Version</key>
<integer>1</integer>
<key>Tracks</key>
<dict>
<key>0001</key>
<dict>
<key>Name</key><string>spam</string>
<key>Detail</key><string>spam spam</string>
</dict>
<key>0002</key>
<dict>
<key>Name</key><string>ham</string>
<key>Detail</key><string>ham ham</string>
</dict>
</dict>
</dict>
</plist>
"""
def XMLtoDict(root):
result = {}
for i in root.getchildren():
if i.tag=="key":
key = i.text
next_tag = i.getnext()
next_tag_name = next_tag.tag
if next_tag_name=="integer":
value = int(next_tag.text)
elif next_tag_name=='string':
value = next_tag.text
elif next_tag_name=='dict':
value = XMLtoDict(next_tag)
else:
value = None
result[key] = value
return dict(result)
import lxml.html as ET
import pprint
root = ET.fromstring(data)
result = XMLtoDict(root.xpath("//plist/dict")[0])
pprint.pprint(result)
Output:
vivek@vivek:~/Desktop/stackoverflow$ python 12.py
{'Tracks': {'0001': {'Detail': 'spam spam', 'Name': 'spam'},
'0002': {'Detail': 'ham ham', 'Name': 'ham'}},
'Version': 1}
I am not getting such exception.
(Unicode Error) 'unicodeescape' codec can't decode bytes…
Tagging not correct in library.xml
import xml.etree.ElementTree as ET tree = ET.parse('library.xml')
Get following exception for input
vivek@vivek:~/Desktop/stackoverflow$ python 12.py
Traceback (most recent call last):
File "12.py", line 46, in <module>
tree = ET.parse('library.xml')
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1183, in parse
tree.parse(source, parser)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 656, in parse
parser.feed(data)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1643, in feed
self._raiseerror(v)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1507, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 4, column 15
This exception due to invalid tagging. To fix this exception, do following:
Change from <key>Version<\key>
to <key>Version</key>
xml.etree.ElementTree
module:code:
def XMLtoDict(root):
result = {}
chidren_tags = root.getchildren()
for j, i in enumerate(chidren_tags):
if i.tag=="key":
key = i.text
next_tag = chidren_tags[j+1]
next_tag_name = next_tag.tag
if next_tag_name=="integer":
value = int(next_tag.text)
elif next_tag_name=='string':
value = next_tag.text
elif next_tag_name=='dict':
value = XMLtoDict(next_tag)
else:
value = None
result[key] = value
return dict(result)
def XMLtoList(root):
result = []
chidren_tags = root.getchildren()
for j, i in enumerate(chidren_tags):
if i.tag=="key":
key = i.text
next_tag = chidren_tags[j+1]
next_tag_name = next_tag.tag
if next_tag_name=="integer":
value = int(next_tag.text)
elif next_tag_name=='string':
value = next_tag.text
elif next_tag_name=='dict':
value = XMLtoList(next_tag)
else:
value = None
result.append([key, value])
return list(result)
import xml.etree.ElementTree as ET
import pprint
tree = ET.parse('library.xml')
root = tree.getroot()
dict_tag = root.find("dict")
if dict_tag is not None:
result = XMLtoDict(dict_tag)
print "Result in Dictinary:-"
pprint.pprint(result)
result = XMLtoList(dict_tag)
print "\nResult in Dictinary:-"
pprint.pprint(result)
output: vivek@vivek:~/Desktop/stackoverflow$ python 12.py
Result in Dictinary:-
{'Tracks': {'0001': {'Detail': 'spam spam', 'Name': 'spam'},
'0002': {'Detail': 'ham ham', 'Name': 'ham'}},
'Version': 1}
Result in Dictinary:-
[['Version', 1],
['Tracks',
[['0001', [['Name', 'spam'], ['Detail', 'spam spam']]],
['0002', [['Name', 'ham'], ['Detail', 'ham ham']]]]]]
Upvotes: 0