rosszt
rosszt

Reputation: 21

Java - handling foreign characters

So, I have some Java code that fetches the contents of a HTML page as follows:

BufferedReader bf;
String response = "";
HttpURLConnection connection;
try 
{
    connection = (HttpURLConnection) url.openConnection();
    connection.setInstanceFollowRedirects(false);
    connection.setUseCaches(false);
    connection.setRequestMethod("GET");
    connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.0; WOW64) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.16 Safari/534.24");
    connection.connect();
    bf = new BufferedReader(new InputStreamReader(connection.getInputStream()));
    String line;
    while ((line = bf.readLine()) != null) {
        response += line;
    }
    connection.disconnect();
}
catch (Throwable ex)
{
    response = "";
}

This works perfectly fine and will return the content to me as required. I then drill down to the area of code that I want to pull, which is as follows:

10€ de réduction chez Asos be!

Java seems to be handling the € fine since it is a HTML entity. The word "réduction" is problematic though. It seems to render it as:

10€ de r�duction chez Asos be!

As you can see it is struggling to handle the "é" character.

How do I go about solving this? I've been searching the internet and playing around with the code for the past few hours but no luck whatsoever! I'm very new to Java so it's all very difficult to get my head around.

Thanks in advance.

Upvotes: 0

Views: 1419

Answers (1)

aalku
aalku

Reputation: 2878

That code is ok but you might need to detect the character encoding of the response (see here) and pass it to the class that wraps the inputStream to get a Reader (see here).

Otherwise the problem is not reading the response but in the stuff you do with that response string.

Upvotes: 1

Related Questions