Mojibakes in SOAP message

Question

On my java web-service I've implemented WebServiceProvider and trying to get the original request that client has done. The problem is that I'm getting unreadable characters like ÐœÐ¾ÑÐºÐ²Ð° inside soap message body's xml tags instead of normal cyrillic letters. So I am seeking ways how to fix this. Probably I could use generic type instead of , but I don't know how to turn it to bytes.
Q1: Is it possible to get client's request as original array of bytes (raw binary data) so that I could decode it manually?
Q2: Is there direct way to fix wrong characters by specifying decoding character set for SOAP message?

My current code is given below:

@WebServiceProvider(
    portName="SoaprequestImplPort",
    serviceName="services/soaprequest",
    targetNamespace="http://tempuri.org/soaprequest",
    wsdlLocation="/wsdl/SoaprequestImpl.wsdl"
)
@BindingType(value="http://schemas.xmlsoap.org/wsdl/soap/http")
@ServiceMode(value=javax.xml.ws.Service.Mode.MESSAGE)
public class SoaprequestImpl implements Provider {

    private static final String hResponse = "

patthoyts · Accepted Answer

What you have shown is just the UTF-8 encoded representation of "Москва". Your SOAP data is most likely to be in an XML file that has at the top which shows that the content is encoded using UTF-8. To turn such data back into Unicode you need to decode it. You also have some HTML escapes in there so you must unescape that first. I used Tcl to test this:

# The original string reported
set s "ÐœÐ¾ÑÐºÐ²Ð°"
# substituting the html escapes
set t "Ð\x9cÐ¾Ñ\x81ÐºÐ²Ð°"
# decode from utf-8 into Unicode
encoding convertfrom utf-8 "Ð\x9cÐ¾Ñ\x81ÐºÐ²Ð°"
Москва

So your SOAP information is probably fine but you most likely need to deal with the HTML escapes before allowing anything to try to decode the string from utf-8.

Mojibakes in SOAP message

Answers (1)

Related Questions