Reputation: 2939
When using HTTP Components (java library for http) the response I get has '
displayed as Æ
and -
displayed as ȗ
.
Upvotes: 1
Views: 1350
Reputation: 4537
With the latest version 4.x, you would use something like below to be Charset agnostic -
HttpEntity entity = response.getEntity();
Charset charset = ContentType.getOrDefault(entity).getCharset();
Upvotes: 0
Reputation: 105220
Ok, so basically you are getting a response without Content-Type
from a server you are not in control of, and you're having encoding issues.
In java
every string is internally handled as Unicode strings, despite the format they come in.
So I'm guessing your problem is where you are displaying this characters, either to the console or to a file.
The console will use the default charset to print the chars there. In my machine for example is MacRoman
, not utf-8
.
So what you need is to get the raw bytes from the response and do something like this:
System.out.println(new String(raw_byte_array, "utf-8"));
Also, this might shed some light in the matter:
http://download.oracle.com/javase/tutorial/i18n/text/string.html
Upvotes: 1