M4tchB0X3r
M4tchB0X3r

Reputation: 1531

StringBuilder append String breaks UTF 8

Im sending an XML with HttpPost to a server. This used to wotk fine, and im doing it succesfully in other parts of the project.

Im using a StringBuilder to create the xml request but since i am appending strings as Data to the nodes, i am getting an error response from the parser on the server:

Invalid byte 2 of 2-byte UTF-8 sequence.

When i log the request and check it in w3c xml validator there are no errors.
This is an excerpt (whole method would be to big and has sensitive Data) from my Stringbuilder Method:

        StringBuilder baseDocument = new StringBuilder();
        baseDocument.append("<?xml version=\"1.0\" encoding=\"UTF-8\"?><request><setDisposalRequest><customer><company><![CDATA[");
        baseDocument.append(company);
        baseDocument.append("]]></company>");
        baseDocument.append("<firstName><![CDATA[");
        baseDocument.append(name);
        baseDocument.append("]]></firstName>");
        ...

As soon as i replace the String vars i append with hardcoded Strings, all works fine

i.e

baseDocument.append(name);

to

baseDocument.append("name");

All the strings have values, non of them a null or are empty!
Before the request i set the StringEntity to xml

se.setContentType("application/xml");

what am i missing?!?

Upvotes: 1

Views: 4378

Answers (1)

Joachim Sauer
Joachim Sauer

Reputation: 308021

Your XML header claims that it's UTF-8, yet you never mention if you actually write UTF-8. Make sure the actual bytes you send are UTF-8 encoded. The error message suggests that you're using another encoding (probably a ISO-8859-* variant).

This is another reason that manually constructing XML like this is dangerous: there are just too many corner cases to observe and it's much easier to use a real XML handling library. Those tend to get the corner cases correct ;-)

And no: StringBuilder certainly does not break UTF-8. The problem is somewhere else.

Upvotes: 1

Related Questions