satoshi
satoshi

Reputation: 4113

String UTF8 encoding issue

The following simple test is failing:

assertEquals(myStringComingFromTheDB, "£");

Giving:

Expected :£
Actual   :£

I don't understand why this is happening, especially considering that is the encoding of the actual string (the one specified as second argument) to be wrong. The java file is saved as UTF8.

The following code:

System.out.println(bytesToHex(myStringComingFromTheDB.getBytes()));
System.out.println(bytesToHex("£".getBytes()));

Outputs:

C2A3
C382C2A3

Can anyone explain me why?

Thank you.

Update: I'm working under Windows 7.

Update 2: It's not related to JUnit, the following simple example:

byte[] bytes = "£".getBytes();
for(byte b : bytes)
{
    System.out.println(Integer.toHexString(b));
}

Outputs:

ffffffc3
ffffff82
ffffffc2
ffffffa3

Update 3: I'm working in IntelliJ Idea, I already checked the options and the encoding is UTF8. Also, it's written in the bottom bar and when I select and right click the pound sign it says "Encoding (auto-detected): UTF-8".

Update 4: Opened the java file with a hex editor and the the pound sign is saved, correctly, as "C2A3".

Upvotes: 4

Views: 5096

Answers (1)

omnomnom
omnomnom

Reputation: 9149

Please note that assertEquals accepts parameters in the following order:

assertEquals(expected, actual)

so in your case string coming from DB is ok, but the one from your Java class is not (as you noticed already). I guess that you copied £ from somewhere - probably along with some weird characters around it which your editor (IDE) does not print out (almost sure). I had similar issues couple of times, especially when I worked on MS Windows: e.g. ctrl+c & ctrl+v from website to IDE.

(I printed bytes of £ on my system with UTF8 encoding and this is C2A3):

for (byte b: "£".getBytes()) {
  System.out.println(Integer.toHexString(b));
}

The other solution might be that your file is not realy UTF-8 encoded. Do you work on Windows or some other OS?

Some other possible solutions according to the question edits:

1) it's possible that IDE uses some other encoding. For eclipse see this thread: http://www.eclipse.org/forums/index.php?t=msg&goto=543800&

2) If both IDE settings and final file encodings are ok, than it's compiler issue. See: Java compiler platform file encoding problem

Upvotes: 3

Related Questions