Reputation: 1287
I have problem with converting string to bytes in Java when I'm porting my C# library to it. It converts the string but it is not the same byte array.
I use this code in C#
string input = "Test ěščřžýáíé 1234";
Encoding encoding = Encoding.UTF8;
byte[] data = encoding.GetBytes(input);
And code in Java
String input = "Test ěščřžýáíé 1234";
String encoding = "UTF8";
byte[] data = input.getBytes(encoding);
Lwft one is Java output and right one is C# how to make Java output same as C# one ?
Upvotes: 1
Views: 1358
Reputation: 54907
In likelihood, the byte arrays are the same. However, if you're formatting them to a string representation (e.g. to view through a debugger), then they would appear different, since the byte
data type is treated as unsigned in C# (having values 0
–255
) but signed in Java (values -128
–127
). Refer to this question and my answer for an explanation.
Edit: Based on this answer, you can print unsigned values in Java using:
byte b = -60;
System.out.println((short)(b & 0xFF)); // output: 196
Upvotes: 3
Reputation: 121820
These arrays are very probably the same.
You are hit by a big difference between C# and Java: in Java, byte
is unsigned.
In order to dump, try this:
public void dumpBytesToStdout(final byte[] array)
{
for (final byte b: array)
System.out.printf("%02X\n", b);
}
And do an equivalent dump method in C# (no idea how, I don't do C#)
Alternatively, if your dump function involves integer types larger than byte, for instance an int, do:
i & 0xff
to remove the sign bits. Note that if you cast byte -1, which reads:
1111 1111
to an int, this will NOT give:
0000 0000 0000 0000 0000 0000 1111 1111
but:
1111 1111 1111 1111 1111 1111 1111 1111
ie, the sign bit is "carried" (otherwise, casting would yield int value 255, which is not -1)
Upvotes: 2