Reputation: 11
We moved from java 8 to java 11 and have now some problems with processing of specific UTF-8 characters (e. g. "赵𮧵"). With java 8 our xml transformation produced "赵𮧵", with java 11 we get "赵��".
Debugging the code I found out that this happens somewhere in the SAXParser. The byte representation of the data read in is ok, in ToStream (or ToXMLStream) which generates the output the wrong data is already part stream.
I would like to debug this problem to find out what's happening but I didn't find a version of the xerces jars which contains the debugging information. Is this available somewhere?
I also tried to download the xerces project. This seems only to be available as svn but I wasn't able to access it with a repository browser. Maybe I didn't use the correct url...
Upvotes: 0
Views: 177
Reputation: 11
The problem with java 11 was that you have to use another XML serializer. In java 8 the serializer included in the distribution is usually used, in java 8 you have to use another, usually xalan. And this serializer has a bug in the newest released version (2.7.3) related to high surrogate characters. Fortunately there is already a fix committed (https://github.com/apache/xalan-java). You have to build your own jar with this fix, then the problem is fixed.
Upvotes: 0