Bobby Jack
Bobby Jack

Reputation: 16018

How can I control UTF-8 ordering in MySQL?

For example, this is the order that a standard 'SORT BY name' query is returning:

name
------
Cebbb
Čebbc
Cebbd

I'd like to SORT BY name and get the accented character first, grouped with other accented characters, e.g.

name
------
Čebbc
Cebbb
Cebbd

By default, MySql treats Č as if it were C, for sorting purposes, and sorts in the order shown.

Alternatively, is there a way, in PHP, that I can 'convert' Č to C, for comparison purposes?

Upvotes: 7

Views: 10964

Answers (4)

Álvaro González
Álvaro González

Reputation: 146340

The simplest way is to apply a proper collation to the column itself, e.g.:

CREATE TABLE foo (
    foo_id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
    `name` VARCHAR(30) NOT NULL COLLATE 'utf8_spanish_ci',
    PRIMARY KEY (`foo_id`)
)
COLLATE='utf8_spanish_ci'
ENGINE=InnoDB;

Update:

Č isn't a Spanish letter:

In Croatian, Slovenian, Bosnian, Skolt Sami, and Lakota alphabets, it is the fourth letter of the alphabet. In Czech, Northern Sámi alphabet and the Baltic languages Lithuanian and Latvian, the letter is in fifth place. In Slovak it is sixth letter of the alphabet. It is also used in Pashto (equivalent to چ‎) and Saanich.

Unlike numbers, letters don't have a fixed absolute order. The whole concept of alphabetic order is relative—you have to pick a rule set first. Run this SQL query to know which ones are available in your MySQL server:

SHOW CHARACTER SET

... and chose the one that better fits your expectations. There's a brief description in the manual.

Upvotes: 2

Bobby Jack
Bobby Jack

Reputation: 16018

OK, here's my solution to what is, admittedly, quite a special-case:

ORDER BY SUBSTRING(name, 1, 1),
         BINARY SUBSTRING(name, 1, 1),
         name

That's certainly not the most efficient query - in this case, though, that's not too big a concern - but it does solve my problem by:

  1. Ordering by the initial character (in which MySql will treat Č and C as identical)
  2. Then ordering by the binary value of the initial character, which will differentiate Č and C
  3. Then ordering by the full name, which will - in effect - order by the rest of the string.

This will order non-optimally after the first character, but that's not a concern.

Upvotes: 0

Yogu
Yogu

Reputation: 9445

To replace special characters like those with accents or umlauts by ordinary latin characters, you can use the php function iconv:

iconv("UTF-8", "ASCII//TRANSLIT", $text)

This will convert the variable $text into pure ASCII characters. For example, müßig would be converted into muessig, and caffée into caffee.

Upvotes: 0

Yogu
Yogu

Reputation: 9445

You can add a COLLATE expression to your ORDER BY clause:

SELECT k
FROM t1
ORDER BY k COLLATE utf8_spanish_ci;

Then, the column will be sorted using the correct collate, and the column definition can stay as it is.

Upvotes: 10

Related Questions