Reputation: 11146
I was reading getbytes and from documentation it states that it will return the resultant byte array.
But when i ran the following program, i found that it is returning array of Unicode symbols.
public class GetBytesExample {
public static void main(String args[]) {
String str = new String("A");
byte[] array1 = str.getBytes();
System.out.print("Default Charset encoding:");
for (byte b : array1) {
System.out.print(b);
}
}
}
The above program prints output
Default Charset encoding:65
This 65
is equivalent to Unicode representation of A
. My question is that where are the bytes whose return type is expected.
Upvotes: 2
Views: 1442
Reputation: 718788
This 65 is equivalent to Unicode representation of A
It is also equivalent to a UTF-8 representation of A
It is also equivalent to a ASCII representation of A
It is also equivalent to a ISO/IEC 8859-1 representation of A
It so happens that the encoding for A is similar in a lot character encodings, and that these are all similar to the Unicode code-point. And this is not a coincidence. It is a result of the history of character set / character encoding standards.
My question is that where are the bytes whose return type is expected.
In the byte array, of course :-)
You are (just) misinterpreting them.
When you do this:
for (byte b : array1) {
System.out.print(b);
}
you output a series of bytes as decimal numbers with no spaces between them. This is consistent with the way that Java distinguishes between text / character data and binary data. Bytes are binary. The getBytes()
method gives a binary encoding (in some character set) of the text in the string. You are then formatting and printing the binary (one byte at a time) as decimal numbers.
If you want more evidence of this, replace the "A"
literal with a literal containing (say) some Chinese characters. Or any Unicode characters greater than \u00ff
... expressed using \u
syntax.
Upvotes: 2
Reputation: 43728
String.getBytes()
returns the encoding of the string using the platform encoding. The result depends on which machine you run this. If the platform encoding is UTF-8, or ASCII, or ISO-8859-1, or a few others, an 'A' will be encoded as 65 (aka 0x41).
Upvotes: 1
Reputation: 140318
There is no PrintStream.print(byte)
overload, so the byte
needs to be widened to invoke the method.
Per JLS 5.1.2:
19 specific conversions on primitive types are called the widening primitive conversions:
- byte to short, int, long, float, or double
- ...
There's no PrintStream.print(short)
overload either.
The next most-specific one is PrintStream.print(int)
. So that's the one that's invoked, hence you are seeing the numeric value of the byte.
Upvotes: 5