Mohammed Siddiq
Mohammed Siddiq

Reputation: 558

Error while loading the string as XML in Scala

I have the following XML (as a String type).

<article mdate="2017-06-06" key="journals/geb/SonmezU05">
<author>Tayfun S&ouml;nmez</author>
<author orcid="0000-0001-7693-1635">M. Utku &Uuml;nver</author>
<title>House allocation with existing tenants: an equivalence.</title>
<pages>153-185</pages>
<year>2005</year>
<volume>52</volume>
<journal>Games and Economic Behavior</journal>
<number>1</number>
<ee>https://doi.org/10.1016/j.geb.2004.04.008</ee>
<url>db/journals/geb/geb52.html#SonmezU05</url>
</article>

When I do the following

XML.loadString()

I get the following error :

org.xml.sax.SAXParseException; lineNumber: 2; columnNumber: 23; The entity "ouml" was referenced, but not declared.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
    at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1472)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(XMLDocumentFragmentScannerImpl.java:1902)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:3061)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:842)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:771)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:327)
    at scala.xml.factory.XMLLoader.loadXML(XMLLoader.scala:41)
    at scala.xml.factory.XMLLoader.loadXML$(XMLLoader.scala:37)
    at scala.xml.XML$.loadXML(XML.scala:60)
    at scala.xml.factory.XMLLoader.loadString(XMLLoader.scala:60)
    at scala.xml.factory.XMLLoader.loadString$(XMLLoader.scala:60)
    at scala.xml.XML$.loadString(XML.scala:60)

due to the line:

<author>Tayfun S&ouml;nmez</author>

Tried converting the string to InputStream like this :

XML.load(new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8)))

But the problem persists. Have been struggling with this for quite a while. Tried with bunch of things available online and referred posts like this

But no progress. Any help will be appreciated.

Upvotes: 2

Views: 695

Answers (2)

Mayank K Rastogi
Mayank K Rastogi

Reputation: 604

If &ouml; is the only entity that is missing you can define it inline with a DOCTYPE as suggested by Kaustabh.

<!DOCTYPE article [
  <!ENTITY ouml "your redired value">
]>

However, if you have a lot of such entities, you are better off creating a separate .dtd file (say "myxml.dtd") and reference it in your XML.

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE myxml SYSTEM "myxml.dtd">
<myxml>
    // The rest of your XML
</myxml>

Now in order for the parser to locate the file, it should be placed in the project's path. If you are bundling the DTD file with your application, you can place the file in your resources folder, find the path to this file, and then replace it in the XML string.

val dtdFilePath = getClass.getClassLoader.getResource("myxml.dtd").toURI

val xmlString = s"""
    <?xml version="1.0" encoding="ISO-8859-1"?>
    <!DOCTYPE myxml SYSTEM "$dtdFilePath">
    <myxml>
        // The rest of your XML
    </myxml>
    """

val xml = XML.loadString(xmlString)

Loading the file using ClassLoader ensures that the file can be accessed even when your app is distributed using a jar.

Upvotes: 0

Kaustabh
Kaustabh

Reputation: 37

I think it is because &ouml; is not a standard xml entity. It is ok in HTML as browser understand it, but not in XML. Adding a declaration to your file may help.

<!DOCTYPE article [
  <!ENTITY ouml "your redired value">
]>

Same for &Uuml;

Upvotes: 1

Related Questions