Reputation: 2769
I getting many XML files and some of them has wrong encoding (e.g. in xml header is ISO-8859-1, but all the strings are in UTF-8, and so on)
For parsing is used xml.etree.ElementTree and this also read xml header with encoding (which is sometimes wrong)
input_element = xml.etree.ElementTree.parse("input.xml").getroot()
I would like to force another encoding and ignore this from header.
Is there any simple way how to do this?
Upvotes: 2
Views: 2774
Reputation: 338208
If you are sure of the encoding, you can use open()
to read the file into a string, and then use ElementTree.fromstring()
to convert that string into an XML document.
with open("input.xml", encoding="Windows-1252") as fp:
xml_string = fp.read()
tree = ElementTree.fromstring(xml_string)
This will ignore the XML declaration, since the file is already decoded, albeit manually. For normal/compliant XML documents, this method is not recommended and you should use ElementTree.parse('filename')
instead.
Upvotes: 6