Reputation: 118
I am trying to parse an xml which contains hex value of 𝓅
. This represents the mathematical symbol 𝓅. The output that I am getting is ��
.
What am I doing wrong?
example input xml :
<?xml version="1.0" encoding="UTF-8"?>
<root>
<data>𝓅</data>
</root>
output :
<?xml version="1.0" encoding="UTF-8"?>
<root>
<data>��</data>
</root>
Code to obtain XML reader :
factory = org.apache.xerces.jaxp.SAXParserFactoryImpl.newInstance();
final XMLReader xmlReader;
xmlReader = factory.newSAXParser().getXMLReader();
I am using UTF-8 encoding to decode while parsing.
The code I am using to read and write xml is this method :
public void readAndWriteXml(InputSource inputSource, OutputStream out) throws IOException, SAXException, ParserConfigurationException {
XMLReader xmlReader = getXmlReader();
Serializer serializer = SerializerFactory.getSerializer(configProps);
serializer.setOutputStream(out);
xmlReader.setContentHandler(serializer.asContentHandler());
if(logger != null){
getLogger().debug("starting xml parsing" + LocalTime.now());
}
xmlReader.parse(inputSource);
if(logger != null){
getLogger().debug("end xml parsing" + LocalTime.now());
}
}
getXMLReader() is this :
final XMLReader xmlReader;
xmlReader = factory.newSAXParser().getXMLReader();
xmlReader.setFeature("http://xml.org/sax/features/namespace-prefixes", true);
xmlReader.setFeature("http://xml.org/sax/features" +
"/namespaces", true);
xmlReader.setFeature("http://xml.org/sax/features/external-parameter-entities", true);
// xmlReader.setFeature("http://xml.org/sax/features/validation", true);
xmlReader.setEntityResolver(wrappedEntityResolver);
xmlReader.setErrorHandler(new SaxErrorHandler());
return xmlReader;
Here I am initialising the class :
public XmlNormalizer(String catalogPath) throws IOException {
// We want the Apache XML parser, not the embedded Oracle Java version.
factory = org.apache.xerces.jaxp.SAXParserFactoryImpl.newInstance();
factory.setNamespaceAware(true);
List<Path> catalogFiles = this.findByFileName(new File(catalogPath).toPath(), CATALOG_FILENAME_PATTERN);
String[] catalogArray = catalogFiles.stream().map(Path::toString).toArray(String[]::new);
configProps = OutputPropertiesFactory.getDefaultMethodProperties("xml");
XMLCatalogResolver xmlCatalogResolver = new XMLCatalogResolver(catalogArray, true);
wrappedEntityResolver = new WrappedEntityResolver(xmlCatalogResolver);
}
WrappedEntityResolver is just a wrapper around import org.apache.xerces.util.XMLCatalogResolver;
Upvotes: 2
Views: 94
Reputation: 163587
That output is most definitely wrong, but it's hard to tell why.
What are the properties passed to the serializer?
If you serialize with Saxon, then with default encoding (UTF-8) the output is
<?xml version="1.0" encoding="UTF-8"?><root>
<data>𝓅</data>
</root>
while with encoding=us-ascii the output is:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<data>𝓅</data>
</root>
Upvotes: 1