Reflux
Reflux

Reputation: 2939

HTTP Components encoding problem

When using HTTP Components (java library for http) the response I get has ' displayed as Æ and - displayed as ȗ.

Upvotes: 1

Views: 1350

Answers (2)

asyncwait
asyncwait

Reputation: 4537

With the latest version 4.x, you would use something like below to be Charset agnostic -

HttpEntity entity = response.getEntity();
Charset charset = ContentType.getOrDefault(entity).getCharset();

Upvotes: 0

Pablo Fernandez
Pablo Fernandez

Reputation: 105220

Ok, so basically you are getting a response without Content-Type from a server you are not in control of, and you're having encoding issues.

In java every string is internally handled as Unicode strings, despite the format they come in.

So I'm guessing your problem is where you are displaying this characters, either to the console or to a file.

The console will use the default charset to print the chars there. In my machine for example is MacRoman, not utf-8.

So what you need is to get the raw bytes from the response and do something like this:

System.out.println(new String(raw_byte_array, "utf-8"));

Also, this might shed some light in the matter:

http://download.oracle.com/javase/tutorial/i18n/text/string.html

Upvotes: 1

Related Questions