zahir hussain
zahir hussain

Reputation: 3739

String received with utf8 format doesn't get displayed correctly

I want to know how to receive the string from a file in Java which has different language letters.

I used UTF-8 format. This can receive some language letters correctly, but Latin letters can't be displayed correctly.

So, how can I receive all language letters?

Alternatively, is there any other format which will allow me to receive all language letters.

Here's my code:

URL url = new URL("http://google.cm");

URLConnection urlc = url.openConnection();
BufferedReader buffer = new BufferedReader(new InputStreamReader(urlc.getInputStream(), "UTF-8")); 
StringBuilder builder = new StringBuilder(); 
int byteRead; 
while ((byteRead = buffer.read()) != -1)
{ 
    builder.append((char) byteRead);
} 

buffer.close();

text=builder.toString();

If I display the "text", the letters can't be displayed correctly.

Upvotes: 1

Views: 1227

Answers (1)

Matthew Flaschen
Matthew Flaschen

Reputation: 284786

Reading a UTF-8 file is fairly simple in Java:

Reader r = new InputStreamReader(new FileInputStream(filename), "UTF-8"); 

If that isn't working, the issue lies elsewhere.

EDIT: According to iconv, Google Cameroon is serving invalid UTF-8. It seems to actually be iso-8859-1.

EDIT2: Actually, I was wrong. It serves (and declares) valid UTF-8 if the user agent contains "Mozilla/5.0" (or higher), but valid iso-8859-1 in (some) other cases. Obviously, the best bet is to use getContentType to check before decoding.

Upvotes: 2

Related Questions