Roman Susi
Roman Susi

Reputation: 4199

Multiple collations for the same MariaDB column?

How to perform ORDER BY for string columns efficiently, when different collations are required, depending on the culture? That is, data for the users from different cultures are stored in the same table and the same column, but each user naturally wants to see it ordered according to it's locale (locale is of course known and fixed per each row in the table). And table can be very long, so column needs and index and can't be post-processed on the application side to the desired collation (it's database task to do heavy-lifting, right?).

For example, the utf8_general_ci produces different results from utf8_swedish_ci.

While I think the problem should be evident for any international project, I can't find any suitable solutions out there. Myself I can imaging only the following solutions, which are not that nice and I doubt nothing better can be done:

  1. Use a separate field for each collation
  2. Maybe, a view can be created per culture and indexed accordingly (I have not worked with MariaDB views though, so this is quite theoretical)
  3. Use a separate "surrogate" field just for collation, maybe VIRTUAL

Now, if there were only one sortable string column, but there may be several. What is the intended and right way to solve this?

Upvotes: 1

Views: 697

Answers (1)

Anse
Anse

Reputation: 1660

As long as you use the same character set (utf8 in your case) for column storage as well as for reading, you can use COLLATE some-utf8-collation after the ORDER BY column-name clause:

SELECT * FROM sometable ORDER BY somecolumn COLLATE utf8_swedish_ci

In my tests this produces different sorting than a german collation:

SELECT * FROM sometable ORDER BY somecolumn COLLATE utf8_german2_ci

Well, as long as the data contains relevant characters, e.g. german umlauts üöä. If not, you won't see a difference.

Multiple columns in the ORDER clause each get their own COLLATE term:

SELECT * FROM sometable
ORDER BY
    somecolumn COLLATE utf8_german2_ci,
    secondcolumn COLLATE utf8_german2_ci

Upvotes: 2

Related Questions