Reputation: 1080
HI! I have a web page content in encoded in ISO-8859-2. How to convert a stream encoded in this charset to java's UTF-8. I'm trying the code below, but it does not work. It messes up some characters. Is there some other way to do this?
BufferedInputStream inp = new BufferedInputStream(in);
byte[] buffer = new byte[8192];
int len1 = 0;
try{
while ( (len1 = inp.read(buffer)) != -1 )
{
String buff = new String(buffer,0,len1,"ISO-8859-2");
stranica.append(buff);
}
Upvotes: 1
Views: 14532
Reputation: 346260
How to convert a stream encoded in this charset to java's UTF-8
Wrong assumption: Java uses UTF-16 internally, not UTF-8.
But your code actually looks correct and should work. Are you absolutely sure the webpage is in fact encoded in ISO-8859-2? Maybe its encoding is declared incorrectly.
Or perhaps the real problem is not with the reading code that you've shown, but with whatever code you use to work with the result. How and where do these "messed up characters" manifest?
Upvotes: 3
Reputation: 11513
Try it with an InputStreamReader and Charset:
InputStreamReader inp = new InputStreamReader(in, Charset.forName("ISO-8859-2"));
BufferedReader rd = new BufferedReader(inp);
String l;
while ((l = rd.readLine()) != null) {
...
}
If you get an UnsupportedCharsetException
, you know what's your problem... Also, with inp.getEncoding()
you can check which encoding is really used.
Upvotes: 4