Raziza O
Raziza O

Reputation: 1646

Differences between getBytes("UTF-8") and Encoding.UTF8.GetBytes() of C#

I'm passing data between c# & Java, converting them in 4 stages:

  1. to byte array
  2. to string (simply adding each byte as character)
  3. to UTF8 bytes 4 to base64 string

What I've found out that java conversion to UTF8 is different than c#.

I'll skip the base64 conversion in the code below.

Java code:

// The result is [-26, 16, 0, 0]
byte[] bytes = ByteBuffer.allocate(4).order(ByteOrder.LITTLE_ENDIAN).putInt(4326).array();

StringBuilder sb = new StringBuilder(bytes.length);
for (byte currByte : bytes) {
   sb.append((char) currByte);
}

// The result is [-17, -90, -66, 16, 0, 0]
byte[] utf8Bytes = sb.toString().getBytes("UTF-8");

C# code

MemoryStream objMemoryStream = new MemoryStream();
BinaryWriter objBinaryWriter = new BinaryWriter(objMemoryStream);
objBinaryWriter.Write(4326);

// The result [230, 16, 0, 0]
byte[] objByte = objMemoryStream.ToArray();
StringBuilder objSB = new StringBuilder();
foreach (byte objCurrByte in objByte)
{
    objSB.Append((char)objCurrByte);
}
string strBytes = objSB.ToString();

objBinaryWriter.Close();
objBinaryWriter.Dispose();

// The result is [195, 166, 16, 0, 0]
var result = UTF8Encoding.UTF8.GetBytes(strBytes);

The two end arrays are different although the input arrays/strings are the same. (Java just using signed byte for displaying - but the values are the same)

I'm not allowed to change the c# code because it is already used by clients..

How can i adjust, and what is the problem in my java code?

Note: Java manage to read the result base64 string from c#, but then it is generating with the same data different string that c# cannot read properly..

Upvotes: 1

Views: 2953

Answers (1)

Peter Lawrey
Peter Lawrey

Reputation: 533870

The problem you have is that char is unsigned but byte is signed. When you do (char) -26 you are doing (char) (-26 & 0xFFFF) which what you intended was (char) (-26 & 0xFF)

Try

for (byte currByte : bytes) {
   sb.append((char) (currByte & 0xFF)); // -26 => 230 not 65510
}

Upvotes: 1

Related Questions