Reputation: 579
I’m a little confused by the collations. Not sure if the DB would traduce a column collation to the table collation on a SELECT
, or is just a ruleset for when comparing.
So what to put as CHARSET
and COLLATE
? (10.4.11-MariaDB)
Here are some examples of what I have:
Case #1: The utf8_bin
column I just SELECT
it, not compare it, but the ascii
I do WHERE bot=?
CREATE TABLE `bots_trace` (
`id` int(10) UNSIGNED NOT NULL,
`bot` varchar(20) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
`info` varchar(2000) COLLATE utf8_bin DEFAULT NULL,
`seen` enum('yes','no') CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT 'no'
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
I almost never ask the DB to do an utf8mb4_bin
comparison or similar, just SELECT
.
So what collations I should use in those cases, what to use as DEFAULT
and as COLLATE
Case #2: The only time I ask the DB to do something with an uft8mb4 is to check the mail.
CREATE TABLE `changed_email` (
`id` int(10) UNSIGNED NOT NULL,
`old_mail` varchar(256) COLLATE utf8mb4_bin NOT NULL,
`ctime` int(10) UNSIGNED NOT NULL,
`ip` varchar(94) CHARACTER SET ascii COLLATE ascii_bin NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
SELECT id FROM changed_email WHERE old_mail = ? LIMIT 1
What to do in this case? Because the only comparison I do is a utf8mb4_bin
I'm assuming that would be the correct CHARSET
& COLLATE
.
Also, I use PHP and I set mysqli_set_charset($link, 'utf8mb4')
, which I needed to retrieve the data correctly, if I change some table COLLATION
to ascii
, could I have trouble retrieving utf8mb4
data columns?
Upvotes: 0
Views: 154
Reputation: 142540
ascii
encoding is a subset of utf8
which is a subset of utf8mb4
. But that is probably irrelevant.
mysqli_set_charset()
announces the CHARACTER SET
of the data in the client.
MySQL, during INSERT
will convert the bytes from the encoding indicated by mysqli_set_charset
to the encoding specified for the column in the table. Similarly SELECT
will convert the other direction.
If you are only dealing with ascii characters, then there is effectively no conversion, and no possibility of problems. If, on the other hand, you have accented letters or Emoji, there will be problems.
The above talks about CHARACTER SET
, which is the "encoding" of letters. the COLLATION
is a different matter; this term refers to the ordering, including case folding and accent stripping. For example, should 'a' = 'A'
or not? For COLLATION
ascii_general_ci or utf8mb4...ci, those are "equal". For any collation ...bin
they are "not equal", and one of them will consistently be sorted (think ORDER BY
) before the other.
In some, but not all, situations, MySQL will allow mixing character sets or collections, and "do the right thing". For example, storing a character in once CHARACTER SET
into another, either it can be converted, or it will mess up. A
is available in perhaps all character sets, but, for example, an accented A
is not available in Ascii.
In the case of COLLATION
, when there is a conflict of collections, there may be a rule that says which collation to use, but often it gives up and complains about a "mix of collections".
Keep in mind that all of this comes from multiple places:
A common example is latin1
accented letters cannot be interpreted as utf8 bytes, but they can be converted to utf8. This raises its ugly head when connection specification disagrees with the bytes in the client.
Upvotes: 1