Reputation: 362
I have a PHP script which is supposed to return an UTF-8 encoded string. However, in Java I can't seem to compare it with it's internal string in any way.
If I print "OK"
and response, they appear the same in console. However, if I check equality
if ( "OK".equals(response) ) {
the result is false. I printed out both in binary, response is 11101111 10111011 10111111 01001111 01001011
, the Java's String "OK"
however is 01001111 01001011
which is cleary ASCII. I tried to convert it to UTF8 in a few ways, but no avail:
String result2 = new String("OK".getBytes(StandardCharsets.UTF_8), StandardCharsets.UTF_8);
and
String result2 = new String("OK".getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);
are both not working, still return ASCII codes for some reason.
byte[] result2 = "OK".getBytes(StandardCharsets.UTF_8); System.out.print(new String(result2));
While this also gives the correct "OK"
result, in binary it still returns ASCII.
I've tried to change communication to numbers instead, but 1
still does not equal to 1
, as Integer.parseInt(response)
returns "1"
is not a String error message, altough in every other aspect, it is recognised as a normal String.
I'm looking for a solution preferably where "OK"
is converted to UTF-8 and not response to ASCII, since I need to communicate with a PHP script along with 2 databases, all set to UTF-8. Java is started with the switch -Dfile.encoding=UTF8
to ensure national characters are not broken.
Upvotes: 0
Views: 112
Reputation: 4654
in UTF-8 all characters with codes 127 or less are encoded by a single byte. Therefore "OK"
in UTF-8 and ASCII is the same two bytes.
11101111 10111011 10111111 01001111 01001011 it is not just simple "OK"
it is
0xEF, 0xBB, 0xBF, "OK"
where 0xEF, 0xBB, 0xBF
are a BOM (Byte order mark)
It is symbols which are not displayed by editors but used to determine the encoding.
Probably those symbols appeared in you php script before <?php
You have to configure your editor to remove BOM from the file
UPD
If it is not possible to alter the php script, you can use a workaround:
// check if the first symbol of the response is BOM
if (!response.isEmpty() && (response.charAt(0) == 0xFEFF)) {
// removing the first symbol
response = response.substring(1);
}
Upvotes: 4