Reputation: 85
I have a process that parses an xml file with java 5 on apache tomcat 6. Since, I compiled in java 7 with an execution join apache tomcat 7, I receive the following error:
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,60] Message: Invalid encoding name "ISO8859-1". at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.setInputSource(XMLStreamReaderImpl.java:219) at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.(XMLStreamReaderImpl.java:189) at com.sun.xml.internal.stream.XMLInputFactoryImpl.getXMLStreamReaderImpl(XMLInputFactoryImpl.java:262) at com.sun.xml.internal.stream.XMLInputFactoryImpl.createXMLStreamReader(XMLInputFactoryImpl.java:129) at com.sun.xml.internal.stream.XMLInputFactoryImpl.createXMLEventReader(XMLInputFactoryImpl.java:78) at org.simpleframework.xml.stream.StreamProvider.provide(StreamProvider.java:66) at org.simpleframework.xml.stream.NodeBuilder.read(NodeBuilder.java:58) at org.simpleframework.xml.core.Persister.read(Persister.java:543) at org.simpleframework.xml.core.Persister.read(Persister.java:444)
Here is the xml fragment used:
?xml version="1.0" encoding="ISO8859-1" standalone="no" ?
If I replace ISO8859-1 by UTF-8 the parsing process works but it's not an option for me.
The lib that I use is simple-xml-2.1.8.jar
As someone noticed me, ISO8859-1 is a wrong content type. ISO-8859-1 is the correct one. As I mentioned, it's difficult to ask "producers" to correct their files. I would want to manage the problem in my application.
Upvotes: 4
Views: 4376
Reputation: 122364
If you know the file encoding up front (UTF-8, ISO-8859-1 or whatever) then you should create a suitable Reader
configured for that encoding, then use the Persister.read
method that takes a Reader
instead of the one that takes a File
or InputStream
. That way you are in control of the byte-to-character decoding rather than relying on the XML reader to detect the encoding (and fail, as the file declared it wrongly). So instead of
File f = new File(....);
MyType obj = persister.read(MyType.class, f);
you would do something more like
File f = new File(....);
MyType obj = null;
try( FileInputStream fis = new FileInputStream(f);
InputStreamReader reader = new InputStreamReader(fis, "ISO-8859-1")) { // or UTF-8, ...
obj = persister.read(MyType.class, reader);
}
Upvotes: 1
Reputation: 29824
Get access to the Xerces XMLReader
instance from Simple XML and set
reader.setFeature("http://apache.org/xml/features/allow-java-encodings", true)
before parsing the XML.
Since ISO8859-1 "works" in Java, this may just work.
The list of supported "features" of Xerces is available here
Alternatively, a good old regex on encoding="ISO8859-1"
to fix the XML should do the trick, prior to processing it.
Upvotes: 2