lunatikz
lunatikz

Reputation: 736

XML Escaping ignores accentuated characters

I am trying to send a POST request, where the request body contains an XML. The receiving API demands that any special characters should encoded with numeric xml-entities.

Lets take the example: İlkay Gündoğan

After xml-escaping with standard libraries like org.apache.commons.text.StringEscapeUtils or using Jsoup with XML-Parser, it only produces:

İlkay Gündoğan, but it ignores İ and ğ. I already read the documentation of those mentioned libs and read that only a certain range of characters is escaped.

I already tried sending a manual crafted example (İlkay Gündoğan) to the recv. API and it worked as expected.

All values are written and read in UTF-8.

Upvotes: 0

Views: 58

Answers (1)

Joop Eggen
Joop Eggen

Reputation: 109557

If the XML encoding is UTF-8 (the default), then converting special characters to numeric entities is not needed. So you have a dubious receiver. escapeXml11 is indeed limited as the javadocs say.

To translate all non-ASCII characters for a String xml:

xml = xml.codePoints()
    .map(cp -> cp < 128 ? Character.toString(cp) : String.format("&#%d;", cp))
    .collect(Collectors.joining());

You might even set the encoding="US-ASCII".

Upvotes: 3

Related Questions