XML support for new UTF-8 like smileys

Question

We have a mobile client that communicates with the server using XML. I have run into a problem, when we need to send some of the more recent UTF-8 smileys, which have been made very easily accessible on new phones. For instance: 😉😯🙃😡😬😠.

Now, my Android application has no issue with encoding and sending this, but on the server side things tend to be a bit more explodey.

If we try to send a message using any of the smileys above we get a huge stack trace, with the relevant part:

javax.xml.transform.TransformerException: org.xml.sax.SAXException: Invalid UTF-16 surrogate detected: d83d d83d ?
java.io.IOException: Invalid UTF-16 surrogate detected: d83d d83d ?
        at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(Unknown Source)
        at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(Unknown Source)

And if we try to parse it:

2017-01-13 14:00:22,717 - com.zylinc.core.gatekeeper.stripes.DoBean - WARN - Could not handle request
org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 93; Character reference "&#
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
        at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
        at com.zylinc.core.gatekeeper.stripes.DoBean.parseRequest(DoBean.java:127)
        at com.zylinc.core.gatekeeper.stripes.DoBean.execute(DoBean.java:56)
        at com.zylinc.core.gatekeeper.Dispatcher.onRequest(Dispatcher.java:107)
        at com.zylinc.core.gatekeeper.io.UntrustedSocketListener.handleRequest(UntrustedSocketListener.java:16)
        at com.zylinc.core.gatekeeper.io.SocketListener$MessageHandler.run(SocketListener.java:228)
        at java.lang.Thread.run(Unknown Source)

In that case the XML is:

Now, this seems to work just fine when outputting JSON, but moving the clients to use JSON is not something we can do overnight. I'm guessing it breaks because the characters used are too new compared to the java version, but it would be nice to ensure that newer smileys won't ever break the messaging.

The code for parsing the XML is pretty straight forward:

SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
XMLReader xmlReader = parser.getXMLReader();
xmlReader.setContentHandler(handler);
StringReader reader = new StringReader(xml);
xmlReader.parse(new InputSource(reader));

Edit:

Creating the XML is done like this:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
mDoc = builder.newDocument();
mRoot = mDoc.createElement("action");
mDoc.appendChild(mRoot);

TransformerFactory transFactory = TransformerFactory.newInstance();
Transformer trans = transFactory.newTransformer();
trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
trans.setOutputProperty(OutputKeys.INDENT, "yes");
trans.setOutputProperty(OutputKeys.VERSION, "1.1");

StringWriter sw = new StringWriter();
StreamResult result = new StreamResult(sw);
DOMSource source = new DOMSource(mDoc);
trans.transform(source, result);

return sw.toString();

Where adding the text is simply:

xml.setAttribute(SUBJECT, obj.getSubject());

Do I have to specify some encoding or other?

XML support for new UTF-8 like smileys

Answers (1)

Related Questions