Transilerate accents in MySql like LIKE

Question

I have a very long list of users. I want to group them by first letter of name. If the first letter is not a letter, it's grouped under #, so I have max 27 groups, for a-z + #.

I want to show only the gorup labels (e.g. F) if it will have results, and for every letter I want to know how many results it will have. So I do a single GROUP query to count all groups:

SELECT
  IF(lastname REGEXP '^[a-z]', UPPER(SUBSTRING(lastname, 1, 1)), '#') first_char,
  COUNT(1) num_users
GROUP BY first_char

That seems to work, BUT using REGEXP means that Ö isn't an O, but a #. That's a problem, because LIKE does find 'Ö' = 'O', so it will be in the O group when I name LIKE 'O%'. I could use REGEXP in the results query too, but I rather file Ö under O.

So the LIKE query works perfectly, but the GROUP query doesn't. How do I do exactly what LIKE does during comparisons?, so the group numbers and results always perfectly match.

Or another way to count correctly?

edit 1

Using LIKE a OR LIKE b OR .. OR LIKE z in the IF doesn't even work, because then the group might be Ö instead of O. The numbers will be correct, but the group label won't be. I really need a conversion...

edit 2

Thanks to @mpen. lastname REGEXP '^[[:alpha:]]' is shorter than 26 LIKEs, but the Ö label problem remains. Converting that outside MySQL is easy though.

mpen · Accepted Answer

You can do the grouping like this:

select
    IF(name REGEXP '^[[:alpha:]]', UPPER(SUBSTRING(name, 1, 1)), '#') first_char,
    COUNT(1) num_users
from _grouptest
group by first_char

And then remove the accents in your scripting language of choice, or if you're brave, you can attempt to remove them in pure MySQL.

_.deburr in JS
Str::removeDiacritics from my PHP lib ptilz which was yoinked from WordPress

Transilerate accents in MySql like LIKE

Answers (1)

Related Questions