Reputation: 6393
I am using the Saxon Processor to transform a huge XML file (+7,000 lines) into an RSS 2.0 XML file.
I have no control of the input XML files, they're being pulled from a server and my XSL file is supposed to transform it into an RSS feed.
Occasionally in the input XML file there is an element containing a href like so,
<A href="https://www.google.com/maps/preview?q=tehran+iran&ie=UTF-8&hq=&hnear=0x3f8e00491ff3dcd9:0xf0b3697c567024bc,Tehran,+Iran&gl=us&ei=24iMU-jvFNLNsQTwi4DgAQ&ved=0CKsBELYDMBQ&source=newuser-ws">(map)</A>
The Saxon Processor doesn't like a certain part of this string though. Here is the error message,
Error on line 837 column 62 of production.xml: SXXP0003: Error reported by XML parser: The reference to entity "ie" must end with the ';' delimiter. org.xml.sax.SAXParseException; systemId: file:/C:/XSLT/Test3/production.xml; lineNumber: 837; columnNumber: 62; The reference to entity "ie" must end with the ';' delimiter.
Based off of the error it appears the processor is getting the ie parameter in the URL string confused with an XML element.
Is there anything I could add into the RSS 2.0 XSL stylesheet that would tell the Saxon Processor to skip over lines like these? I actually do not need the information from <A>
,
<A href="https://www.google.com/maps/preview?q=tehran+iran&ie=UTF-8&hq=&hnear=0x3f8e00491ff3dcd9:0xf0b3697c567024bc,Tehran,+Iran&gl=us&ei=24iMU-jvFNLNsQTwi4DgAQ&ved=0CKsBELYDMBQ&source=newuser-ws">(map)</A>
So if I could skip over lines like these entirely and if that would resolve the error that would be great. Alternatively, if it's suspected that the Saxon Processor has a bug and another processor will not cause this problem that would be great as well (and if you could recommend a more appropriate processor).
Upvotes: 0
Views: 552
Reputation: 5432
The input XML is improper.. The &
must be escaped.. You can correct your input by replacing all occurrences of &
with &
.
And also, the other characters that you would have to escape if present in your XML are:
"
with "
,
'
with '
,
<
with <
, and
>
with >
Upvotes: 3