Reputation: 187
As title say ... I read content from htto response
InputStream is = response.getEntity().getContent();
String cw = IOUtils.toString(is);
byte[] b = cw.getBytes("Cp1250");
String x = StringUtils.newStringUtf8(b);
String content = new String(b, "UTF-8");
System.out.println(content);
I have tried plenty of variations. I am little confused about what are correct encoding constants used as strings. windows-1250 or Cp1250. UTF-8 or utf-8 or utf8?
Upvotes: 5
Views: 40111
Reputation: 1
I see better to use Scanner for reading in different charsets.
FileInputStream is = new FileInputStream(fileOrPath);
Scanner scanner = new Scanner(is, "cp1250");
String out = scanner.next();
And method next()
returns String
value in charset of application.
Tested on "czech language" from "cp1250" to "UTF-8".
Upvotes: -1
Reputation: 109547
Encoding have a canonical (unique) name and other varying names, and that case-insensitive. For instance "UTF-8" is the canonical name, but some java versions back it was "UTF8"; it got written more to the common usage. The same for "Windows-1250," which you might see also in HTML pages. "Cp1250" (Code-Page) is a java internal name.
In java byte[] is binary data, String (internally Unicode) is text. Conversion between both needs an encoding, often optional though, taking the operating system default.
byte, InputStream, OutputStream <-> String, char, Reader, Writer
String cw = IOUtils.toString(is, "UTF-8"); // InputStream is binary gives byte[], hence give encoding
byte[] b = cw.getBytes("Cp1250");
String x = new String(b, "Cp1250");
String content = s;
System.out.println(content);
To allow this universal (qua encoding) String, String internally uses char, UTF-16. String constants are stored in the .class file as UTF-8 (more compact).
Upvotes: 3
Reputation: 108889
Assuming Apache Commons IO, use one of the methods that specifies an encoding:
String cw = IOUtils.toString(is, "windows-1250");
All strings are implicitly UTF-16 in Java. Other encodings are generally represented using byte arrays.
Upvotes: 1
Reputation: 1500525
You seem to think that a String
object has an encoding. That's not correct. An encoding is used as part of the translation from binary data (a byte[]
or InputStream
) to text data (a String
or char[]
etc).
It's not clear what IOUtils.toString
is doing, but it's almost certainly losing data or at least handling it inappropriately. If your data is originally in Windows-1250, then you should use an InputStreamReader
wrapping the InputStream
, specifying the charset in the InputStreamReader
constructor call.
It's not clear where UTF-8 comes in - you might want to write out the data in UTF-8 afterwards, but the result of that would be byte[]
, not a string.
Upvotes: 6
Reputation: 47729
You're converting backwards. You need to get the input data as a byte
array and then use String(byteArray, "Cp1250")
to create the String object. Then if you want UTF-8, use String.getBytes("UTF-8")
.
Upvotes: 6