Reputation: 21
I need your help to resolve an encoding issue as it seems.
I have a lot of input files that have the same pattern has this below :
<?xml version='1.0' encoding='iso-8859-1'?>
<root>
<Module name="ModuleName">
<Parameter Value="Data01$|Data02F1F5$|Data03:$|Data04 : $|"/>
</Module>
</root>
I need to be able to parse the file but there is a lot of special characters you can see below :
I can't use lxml or beautiful soup.
I tried the different options below but I couldn't find the solution :
from xml.etree import ElementTree
file = 'StackOverflow.xml'
with open(file, 'r', encoding = 'iso-8859-1') as f:
string = f.read()
print(string)
with open(file, 'w', encoding = 'utf-8') as f:
f.write(string)
with open(file, 'rb') as f :
root = ElementTree.fromstring(f.read())
tree = ElementTree.ElementTree(root)
tree.write(file, encoding='utf-8', xml_declaration = True)
with open(file, 'rb') as f:
parser = etree.XMLParser(encoding = "iso-8859-1")
root = etree.parse(f, parser)
string = etree.tostring(root, xml_declaration = True, encoding="utf-8").decode('utf-8').encode('iso-8859-1')
with open('file', 'wb') as f:
target.write(string)
Upvotes: -1
Views: 113
Reputation: 3591
I can't reproduce your problem:
import xml.etree.ElementTree as ET
xml_file_path = "StackOverFlow.xml"
tree = ET.parse(xml_file_path)
root = tree.getroot()
for elem in root.iter():
print(elem.tag, elem.attrib)
Output:
root {}
Module {'name': 'ModuleName'}
Parameter {'Value': 'Data01$|Data02F1F5$|Data03:$|Data04 : $|'}
Your picture said utf-8:
import xml.etree.ElementTree as ET
#even with utf-8 it works:
xml_str = """<?xml version='1.0' encoding='utf-8'?>
<root>
<Module name="ModuleName">
<Parameter Value="Data01$|Data02F1F5$|Data03:$|Data04 : $|" />
</Module>
</root>"""
root = ET.fromstring(xml_str)
for elem in root.iter():
print(elem.tag, elem.attrib)
Output works also:
root {}
Module {'name': 'ModuleName'}
Parameter {'Value': 'Data01$|Data02F1F5$|Data03:$|Data04 : $|'}
Upvotes: 0