Reputation: 23
I'm making a java code and I want to read from a file, 64kb at a time and convert those 64kb to string. I'm able to read and keep them in a vector but when I try to convert to String there are allways more characters than there should be. For example:
Converting %Çì¢ with String s = new String(byte[], "UTF8")
it gives me %????
Converting %Çì¢ with String s = new String(byte[])
or new String(byte[],"Cp1252")
, etc , etc gives me %Çì?¢ which would be perfect if it weren't for the ? . Can anyone help me? Tried every way to convert byte[] to string :(
Upvotes: 0
Views: 124
Reputation: 121800
Don't use String
for binary data. It cannot work.
Strings in Java at runtime are sequences of char
s, and not all byte sequences can be converted to char
s.
If you need a String
representation of binary data, use a dedicated format which can do so (base64 comes to mind).
See here for the full story; there is an example at the end showing why String
for binary cannot work.
Here is a sample code which will yell at you (ie, throw an exception) if your byte
array cannot be converted to a string:
final CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder()
.onMalformedInput(CodingErrorAction.REPORT);
try {
decoder.decode(ByteBuffer.wrap(yourByteArray));
} catch (CharacterCodingException e) {
System.err.println("No can't do...");
e.printStackTrace(System.err);
}
By default, unmappable byte sequences are replaced and don't raise an error.
And of course, there is no guarantee that a correct byte sequence will take exactly 64k.
Upvotes: 2
Reputation: 31648
I think the UTF-8 needs a hyphen try this:
String s = new String(myArray, "UTF-8")
EDIT
There might also be a problem with reading UTF-8 every time in the String. The Byte order mark (BOM) will only be in the first bytes of the file, not every 64k. It might be better to use a Reader
BufferedReader in = new BufferedReader(
new InputStreamReader(new FileInputStream(myFile), "UTF-8"));
AND you might also have a problem printing the String to the console if the console isn't UTF-8 so you have to change the encoding there too.
PrintStream out = new PrintStream(System.out, true, "UTF-8");
String str;
while ((str = in.readLine()) != null) {
out.println(str);
}
Upvotes: 0
Reputation:
I don't know what those characters are supposed to be, but this is the proper way to use String
with data that is UTF-8
, if that's the actual encoding:
byte[] byteArray = new byte[] {87, 79, 87, 46, 46, 46};
try
{
String value = new String(byteArray, "UTF-8");
System.out.println(value);
}
catch (UnsupportedEncodingException ex)
{
// do something
}
Output:
WOW...
Upvotes: 0