S.T. Veje
S.T. Veje

Reputation: 143

Use different collation with each query

I have a table called 'names':

CREATE TABLE names (
    id              INT             UNSIGNED PRIMARY KEY AUTO_INCREMENT,
    name            VARCHAR(255)    NOT NULL,
    sex             TINYINT         UNSIGNED,
    canon           BIT(1)          NOT NULL DEFAULT FALSE
  UNIQUE KEY uk_names (name, sex)
) ENGINE=InnoDB,
  CHARACTER SET utf8,
  COLLATE utf8_bin ;

Which contains a bunch of French names. Since they're French, many of them contain diacritics (é, à, è, ù, ï, ö, ü, â, ê, î, ô, û, and ç). So, for example, the names Leia, Léia, Leïa, and Léïa are all considered different names and, in the eyes of MySQL, unique.

This is exactly how it should be for my purposes.

However, when I run a query such as (just a simplified example):

SELECT * FROM names WHERE name = 'Leia'

I get a single row, corresponding to the exact spelling 'Leia' without any diacritics.

Or if I run this query:

SELECT * FROM names WHERE name LIKE '%Helene%'

I might get the names 'Helene' and 'Marie-Helene' but not 'Hélène' and 'Marie-Hélène'.

This is sometimes exactly what I want. Sometimes I need to search for an exact name, with an exact spelling. But other times I want to get all variant spellings of a name instead.

From what I can tell, this has to do with my chosen collation (utf8_bin). As I understand it, if I want all variant spellings to match, I need utf8_unicode_ci instead. Is that correct?

However, if I try to change my table to use utf8_unicode_ci, I get an error saying there are duplicate rows because of my unique constraint. This suggests that changing the collation of the table itself (and removing the unique constraint to do so) would give me the opposite problem: not being able to look up exact spellings of a name.

So how can I choose collation on the fly, for each individual query?

I tried running this before my query:

SET NAMES utf8 COLLATE utf8_unicode_ci

But that doesn't seem to do anything, and in the various answers I've seen regarding SET NAMES it seems to suggest that you should never use it anyway (it leads to security issues?)

By the way, I connect to my database in PHP (with PDO) using the DSN mysql:host=...;dbname=...;charset=utf8 in case that matters.

The only solution I can seem to think of, is to create a list of all possible alternate spellings of a name (in PHP probably), and then run something like:

SELECT * FROM names WHERE name IN ('Leia', 'Léia', 'Leïa', 'Léïa')

But since there are quite a few diacritics, it seems like it could potentially become quiet a long list for some names (Francois, François, Fràncois, Frànçois, Francoîs, Françoîs, Fràncoîs, Frànçoîs, Francöîs, Françöîs, ... and so on. I know some of those spellings don't actually exist, but my script would have no way of knowing which ones without testing every possibility against the database).

Surely there's a simpler and more elegant way to do what I want?

Upvotes: 2

Views: 436

Answers (2)

Rick James
Rick James

Reputation: 142453

If you are using MySQL 5.7 or older, utf8_unicode_520_ci may be better in some cases.

With 8.0, utf8mb4_0900_ai_ci is preferred.

"_ci" means case insensitive.

"_as_ci" means accent sensitive and case insensitive, as with 8.0's utf8mb4_0900_as_ci .

"_bin" means to just compare the bits -- all accents and cases are different.

Upvotes: 0

alirezadp10
alirezadp10

Reputation: 183

You must use this query:

SELECT * FROM names WHERE name LIKE _utf8'%Leia%' collate utf8_general_ci;

Upvotes: 2

Related Questions