Reputation: 590
I'm having some encoding problems in a Java application that makes HTTP requests to an IIS server.
Iterating over the headers of the URLConnection
object I can see the following (relevant) headers:
Transfer-Encoding: [chunked]
Content-Encoding: [utf-8]
Content-Type: [text/html; charset=utf-8]
The URLConnection.getContentEncoding()
method returns utf-8 as the document encoding.
This is how my HTTP request, and stream read is being made:
OutputStreamWriter sw = null;
BufferedReader br = null;
char[] buffer = null;
URL url;
url = new URL(this.URL);
URLConnection connection = url.openConnection();
connection.setDoOutput(true);
sw = new OutputStreamWriter(connection.getOutputStream());
sw.write(postData);
sw.flush();
br = new BufferedReader(new InputStreamReader(connection.getInputStream(), "UTF8"));
StringBuilder totalResponse = new StringBuilder();
String line;
while((line = br.readLine()) != null) {
totalResponse.append(line);
}
buffer = totalResponse.toString().toCharArray();
if (sw != null)
sw.close();
if (br != null)
br.close();
return buffer;
However the following string sent by the server "ÃÃÃção" is received by the client as "�����o".
What am I doing wrong ?
Upvotes: 0
Views: 1089
Reputation: 21
Can you try by putting the stream as part of request attribute and then printing it out on client side. a request attribute will be received as is withou any encoding issues
Upvotes: 0
Reputation: 1973
Based on your comments, you are trying to receive a FIX message from an IIS server and FIX uses ASCII. There are only a small subset of tags which support other encoding and they have to be treated in a special manner (non-ASCII tags in the standard FIX spec are 349,351,353,355,357,359,361,363,365). If such tags are present, you will get a tag 347 with a value specifying the encoding (for example UTF-8) and then each tag, will be preceded by a tag giving you the length of the coming encoded value (for tag 349, you will always get 348 first with an integer value)
In your case, it looks like the server is sending a custom tag 10411 (the 10xxx range) in some other encoding. By convention, the preceding tag 10410 should give you the length of the value in 10411, but it contains "0000" instead, which may have some other meaning.
Note that although FIX message are very readable, they should still be treated as binary data. Tags and values are mostly ASCII characters, but the delimiter (SOH) is 0x01 and as mentioned above, certain tags may be encoded with another encoding. The IIS service should really return the data as application/octet-stream
so it can be received properly. Attempting to return it as text/html
is asking for trouble :).
Upvotes: 1
Reputation: 109532
For good order a couple of corrections.
URLConnection connection = url.openConnection();
connection.setDoOutput(true);
connection.connect();
try (Writer sw = new OutputStreamWriter(connection.getOutputStream(),
StandardCharsets.UTF_8)) {
sw.write(postData);
sw.flush();
try (BufferedReader br = new BufferedReader(
new InputStreamReader(connection.getInputStream(),
StandardCharsets.UTF_8))) {
StringBuilder totalResponse = new StringBuilder();
String line;
while ((line = br.readLine()) != null) {
totalResponse.append(line).append("\r\n");
}
return totalResponse.toString().toCharArray();
} // Close br.
} // Close sw.
Maybe:
postData = ... + "Accept-Charset: utf-8\r\n" + ...;
Receiving the totalResponse.toString()
you should have all read correctly.
But then when displaying again, the String/char is again converted to bytes, and there the encoding fails. For instance System.out.println will not do as probably the Windows encoding is used.
You can test the String by dumping its bytes:
String s = totalResponse.toString();
Logger.getLogger(getClass().getName()).log(Level.INFORMATION, "{0}",
Arrays.toString(s.getBytes(StandardCharsets.UTF_8)));
In some rare cases the font will not contain the special characters.
Upvotes: 0
Reputation: 41997
If the server really sends a Content-Encoding of "UTF-8" then it is very confused. See http://svn.tools.ietf.org/svn/wg/httpbis/specs/rfc7231.html#header.content-encoding
Upvotes: 0