Reputation: 914
I'm trying to parse the summary
and results
entities from the following xml file:
A small snippet:
<result>
<resultType>Potential Problem</resultType>
<lineNum>296</lineNum>
<columnNum>29</columnNum>
<errorMsg><a href="https://achecker.ca/checker/suggestion.php?id=43"
onclick="AChecker.popup('https://achecker.ca/checker/suggestion.php?id=43'); return false;"
title="Suggest improvements on this error message" target="_new"><code>h2</code> may be used for formatting.</a>
</errorMsg>
<errorSourceCode><h2>O portal netemprego.gov.pt foi substituído pelo iefponline.</h2></errorSourceCode>
<sequenceID>296_29_43</sequenceID>
<decisionPass>This <code>h2</code> element is really a section header.</decisionPass>
<decisionFail>This <code>h2</code> element is used to format text (not really a section header).</decisionFail>
</result>
I'm getting an error message: xml.etree.ElementTree.ParseError: undefined entity: line 55, column 51
.
I know that this error is related to the encoding. The file is presented with a UTF-8 header tag which sounds to be the right one to the chars contained in the XML. After reading about this and trying multiple workarounds i'm not able to avoid that error. What can i do in python to change it and parse summary and results entities?
Upvotes: 1
Views: 1542
Reputation: 163322
No, it's nothing to do with encoding. It's because you have an entity reference í
that is not defined anywhere. If it was HTML, this entity name would be built in, but that's not the case for XML. Apart from a handful of entities like amp
and lt
, entity references in XML are not recognised unless they are defined in the DTD.
Upvotes: 1