IceFire
IceFire

Reputation: 4147

Unequal binary char comparison when MySQL string fetched from PHP

I have some interesting issue. The following two codes do not produce the same output:

$result = $sql->QueryFetch("SELECT machinecodeSigned FROM ...");
echo bin2hex($result['machinecodeSigned']);

and

$result = $sql->QueryFetch("SELECT HEX(machinecodeSigned) FROM ...");
echo $result['machinecodeSigned'];

So, $sql is just some wrapper class and method QueryFetch internally just calls PHP standard functions for query and fetch to attain values.

I get two different results, though. For example, for some arbitrary input in my database, I get:

08c3bd79c3a0c2a66fc2bb375b6370c399c3acc3ba7bc2b8c2b203c39d70

and

08FD79E0A66FBB375B6370D9ECFA7BB8B203DD70

Ignoring case-sensitivity, the first output is nonsense while the other one is correct.

machinecodeSigned is a char(255) field that is latin-1 encoded and has collation latin-1 (which should not play a role, I assume).

What could be the reason that I get two different results? This used to yield the same results for years, but suddenly I had to change the code from version 1 to version 2 in order for it to produce the correct result. It seems, as if PHP does some arbitrary conversion of the bytes in the string.

Edit: It seems necessary to say that the field is not human-readable. In any case, since the second output is the correct one, feel free to convert the hexadecimal form to ASCII characters, if this helps you.

Edit: SHOW CREATE TABLE yields:

CREATE TABLE `user` (
 `ID` int(9) NOT NULL AUTO_INCREMENT,
 `machinecodeSigned` char(255) CHARACTER SET latin1 COLLATE latin1_bin    DEFAULT NULL
 PRIMARY KEY (`ID`),
) ENGINE=InnoDB AUTO_INCREMENT=10092 DEFAULT CHARSET=latin1 COLLATE=latin1_german2_ci

Upvotes: 1

Views: 259

Answers (1)

Rick James
Rick James

Reputation: 142540

char(255) CHARACTER SET latin1 COLLATE latin1_bin

will read/write bytes unchanged. It would be better to say BINARY(255), or perhaps something else.

If you tell the server that your client wants to talk in "utf8", and you SELECT that column, then MySQL will translate from latin1 (the charset of the data) to utf8 (the encoding you say the client wants). This leads to the longer hex string.

You say that phpmyadmin says "utf8" somewhere; that is probably the cause of the confusion.

If it had been stored as base64, there would be no confusion because base64 uses very few different characters, and they are encoded identically in latin1 and utf8. Furthermore, latin1_bin would have been appropriate. So, another explanation of what went wrong is the unwanted reconversion from base64 to binary.

MySQL's implementation of latin1_bin is simple and permissive -- all 256 bit values are simply stored and loaded, unchecked. This makes it virtually identical to BLOB and BINARY.

This is probably the base64_encode that should have been stored:

MDhGRDc5RTBBNjZGQkIzNzVCNjM3MEQ5RUNGQTdCQjhCMjAzREQ3MA==

Datatypes starting with VAR or ending with BLOB or TEXT are implemented via a 'length' field plus the bytes needed to represent the value.

On the other hand, CHAR and BINARY are fixed length, and padded by spaces (CHAR) or \0 (BINARY).

So, writing binary info to CHAR(255) actually may modify the data due to spaces appended.

Upvotes: 1

Related Questions