idlackage
idlackage

Reputation: 2863

Fatal Error :1:40: Content is not allowed in prolog

I have a super simple XML document encoded in UTF-16 LE.

<?xml version="1.0" encoding="utf-16"?><X id="1" />

I'm loading it in as such (using jcabi-xml):

BOMInputStream bomIn = new BOMInputStream(Main.class.getResourceAsStream("resources/test.xml"), ByteOrderMark.UTF_16LE);
String firstNonBomCharacter = Character.toString((char)bomIn.read());
Reader reader = new InputStreamReader(bomIn, "UTF-16");
String xmlString = IOUtils.toString(reader);
xmlString = xmlString.trim();
xmlString = firstNonBomCharacter + xmlString;
bomIn.close();
reader.close();
final XML xml = new XMLDocument(xmlString);

I have checked that there are no extra BOM/junk symbols (leading or anywhere) by saving out the file and inspecting it with a hex editor. The XML is properly formatted.

However, I still get the following error:

[Fatal Error] :1:40: Content is not allowed in prolog.
Exception in thread "main" java.lang.IllegalArgumentException: Invalid XML: "<?xml version="1.0" encoding="utf-16"?><X id="1" />"
    at com.jcabi.xml.DomParser.document(DomParser.java:115)
    at com.jcabi.xml.XMLDocument.<init>(XMLDocument.java:155)
    at Main.getTransformedString(Main.java:47)
    at Main.main(Main.java:26)
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 40; Content is not allowed in prolog.
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
    at com.jcabi.xml.DomParser.document(DomParser.java:105)
    ... 3 more

I have googled up and down for this error but they all say that it's the BOM's fault, which I have confirmed (to the best of my knowledge) to not be the case. What else could be wrong?

Upvotes: 2

Views: 1941

Answers (1)

Aaryn Tonita
Aaryn Tonita

Reputation: 490

The following works for me:

    try (InputStream stream = Test.class.getResourceAsStream("/Test.xml")) {
        StreamSource source = new StreamSource(stream);
        final XML xml = new XMLDocument(source);
    }

With the input file's hex dump:

FF FE 3C 00 3F 00 78 00 6D 00 6C 00 20 00 76 00 65 00 72 00 73 00 69 00  
6F 00 6E 00 3D 00 27 00 31 00 2E 00 30 00 27 00 20 00 65 00 6E 00 63 00 
6F 00 64 00 69 00 6E 00 67 00 3D 00 27 00 55 00 54 00 46 00 2D 00 31 00 
36 00 27 00 3F 00 3E 00 3C 00 58 00 20 00 69 00 64 00 3D 00 22 00 31 00 
22 00 2F 00 3E 00

As far as I can tell, in your example you are converting the contents of the file to a string. But this is problematic because you actually throw away the encoding when you convert bytes to string. When the SAX parser converts the string to a byte array, it decides it will be UTF-8, but the prolog states that it is UTF-16 and so you have a problem.

Instead, when I use the StreamSource, it just automatically detects the fact that the file is encoded in UTF-16 LE from the BOM.

If you are not using java-7 or up and cannot use try-with-resources, then use the stream.close() as before.

Upvotes: 2

Related Questions