nitotm
nitotm

Reputation: 579

Proper configuration to mix Collations in SQL?

I’m a little confused by the collations. Not sure if the DB would traduce a column collation to the table collation on a SELECT, or is just a ruleset for when comparing. So what to put as CHARSET and COLLATE? (10.4.11-MariaDB)

Here are some examples of what I have:
Case #1: The utf8_bin column I just SELECT it, not compare it, but the ascii I do WHERE bot=?

CREATE TABLE `bots_trace` (
  `id` int(10) UNSIGNED NOT NULL,
  `bot` varchar(20) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
  `info` varchar(2000) COLLATE utf8_bin DEFAULT NULL,
  `seen` enum('yes','no') CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT 'no'
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

I almost never ask the DB to do an utf8mb4_bin comparison or similar, just SELECT. So what collations I should use in those cases, what to use as DEFAULT and as COLLATE

Case #2: The only time I ask the DB to do something with an uft8mb4 is to check the mail.

 CREATE TABLE `changed_email` (
      `id` int(10) UNSIGNED NOT NULL,
      `old_mail` varchar(256) COLLATE utf8mb4_bin NOT NULL,
      `ctime` int(10) UNSIGNED NOT NULL,
      `ip` varchar(94) CHARACTER SET ascii COLLATE ascii_bin NOT NULL
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

SELECT id FROM changed_email WHERE old_mail = ? LIMIT 1

What to do in this case? Because the only comparison I do is a utf8mb4_bin I'm assuming that would be the correct CHARSET & COLLATE.

Also, I use PHP and I set mysqli_set_charset($link, 'utf8mb4'), which I needed to retrieve the data correctly, if I change some table COLLATION to ascii, could I have trouble retrieving utf8mb4 data columns?

Upvotes: 0

Views: 154

Answers (1)

Rick James
Rick James

Reputation: 142540

ascii encoding is a subset of utf8 which is a subset of utf8mb4. But that is probably irrelevant.

mysqli_set_charset() announces the CHARACTER SET of the data in the client.

MySQL, during INSERT will convert the bytes from the encoding indicated by mysqli_set_charset to the encoding specified for the column in the table. Similarly SELECT will convert the other direction.

If you are only dealing with ascii characters, then there is effectively no conversion, and no possibility of problems. If, on the other hand, you have accented letters or Emoji, there will be problems.

The above talks about CHARACTER SET, which is the "encoding" of letters. the COLLATION is a different matter; this term refers to the ordering, including case folding and accent stripping. For example, should 'a' = 'A' or not? For COLLATION ascii_general_ci or utf8mb4...ci, those are "equal". For any collation ...bin they are "not equal", and one of them will consistently be sorted (think ORDER BY) before the other.

In some, but not all, situations, MySQL will allow mixing character sets or collections, and "do the right thing". For example, storing a character in once CHARACTER SET into another, either it can be converted, or it will mess up. A is available in perhaps all character sets, but, for example, an accented A is not available in Ascii.

In the case of COLLATION, when there is a conflict of collections, there may be a rule that says which collation to use, but often it gives up and complains about a "mix of collections".

Keep in mind that all of this comes from multiple places:

  • The column definition
  • The connection parameters (between client and MySQL server)
  • The bytes in the client.

A common example is latin1 accented letters cannot be interpreted as utf8 bytes, but they can be converted to utf8. This raises its ugly head when connection specification disagrees with the bytes in the client.

Upvotes: 1

Related Questions