Kyle Bridenstine
Kyle Bridenstine

Reputation: 6393

Saxon Processor throwing exceptions for href parts of input XML file

I am using the Saxon Processor to transform a huge XML file (+7,000 lines) into an RSS 2.0 XML file.

I have no control of the input XML files, they're being pulled from a server and my XSL file is supposed to transform it into an RSS feed.

Occasionally in the input XML file there is an element containing a href like so,

  <A href="https://www.google.com/maps/preview?q=tehran+iran&ie=UTF-8&hq=&hnear=0x3f8e00491ff3dcd9:0xf0b3697c567024bc,Tehran,+Iran&gl=us&ei=24iMU-jvFNLNsQTwi4DgAQ&ved=0CKsBELYDMBQ&source=newuser-ws">(map)</A>

The Saxon Processor doesn't like a certain part of this string though. Here is the error message,

Error on line 837 column 62 of production.xml: SXXP0003: Error reported by XML parser: The reference to entity "ie" must end with the ';' delimiter. org.xml.sax.SAXParseException; systemId: file:/C:/XSLT/Test3/production.xml; lineNumber: 837; columnNumber: 62; The reference to entity "ie" must end with the ';' delimiter.

Based off of the error it appears the processor is getting the ie parameter in the URL string confused with an XML element.

Is there anything I could add into the RSS 2.0 XSL stylesheet that would tell the Saxon Processor to skip over lines like these? I actually do not need the information from <A>,

  <A href="https://www.google.com/maps/preview?q=tehran+iran&ie=UTF-8&hq=&hnear=0x3f8e00491ff3dcd9:0xf0b3697c567024bc,Tehran,+Iran&gl=us&ei=24iMU-jvFNLNsQTwi4DgAQ&ved=0CKsBELYDMBQ&source=newuser-ws">(map)</A>

So if I could skip over lines like these entirely and if that would resolve the error that would be great. Alternatively, if it's suspected that the Saxon Processor has a bug and another processor will not cause this problem that would be great as well (and if you could recommend a more appropriate processor).

Upvotes: 0

Views: 552

Answers (1)

Linga Murthy C S
Linga Murthy C S

Reputation: 5432

The input XML is improper.. The & must be escaped.. You can correct your input by replacing all occurrences of & with &amp;.

And also, the other characters that you would have to escape if present in your XML are:

" with &quot;,

' with &apos;,

< with &lt;, and

> with &gt;

Upvotes: 3

Related Questions