poudigne
poudigne

Reputation: 1766

Sort array of UTF-8 strings so that un/accented letters are treated equally

I'm trying to compare two string lets say Émilie and Zoey. Well E comes before Z but on the ASCII chart Z comes before É so a normal if ( str1 > str2 ) won't work.

I tried with if (strcmp(str1,str2) > 0), but that still doesn't work. So I'm looking into a native way to compare string with UTF-8 characters.

Upvotes: 19

Views: 25739

Answers (5)

mickmackusa
mickmackusa

Reputation: 48071

Directly modifying values is not good as a general use technique. Also, it is a bad idea (inefficient) to perform string mutations (x2) on each iteration of a sorting function.

Improving the script efficiency of Marqitos's answer, make singular mapped calls of iconv() and sort the original array by that mutated copy.

Code: (Demo)

setLocale(LC_ALL, 'fr_FR');

$names = [
   'Zoey and another (word)',
   'Émilie and another word',
   'Amber'
];

array_multisort(
    array_map(fn($v) => iconv('utf-8', 'ascii//TRANSLIT', $v), $names),
    $names
);

var_export($names);

Output:

array (
  0 => 'Amber',
  1 => 'Émilie and another word',
  2 => 'Zoey and another (word)',
)

Upvotes: 0

Marqitos
Marqitos

Reputation: 21

I recomend to use the usort function, to avoid modifying the values, and still compare them correctly.

Example:

<?php

setLocale(LC_ALL, 'fr_FR');

$names = [
   'Zoey and another (word)',
   'Émilie and another word',
   'Amber'
];

function compare(string $a, string $b) {
    $a = preg_replace('#[^\w\s]+#', '', iconv('utf-8', 'ascii//TRANSLIT', $a));
    $b = preg_replace('#[^\w\s]+#', '', iconv('utf-8', 'ascii//TRANSLIT', $b));

    return strcmp($a, $b);
}

usort($names, 'compare');

echo '<pre>';
print_r($names);
echo '</pre>';

with result:

Array
(
    [0] => "Amber"
    [1] => "Émilie and another word"
    [2] => "Zoey and another (word)" 
)

Upvotes: 1

mmvsbg
mmvsbg

Reputation: 3588

Here's something that works for me although I'm not sure if it will serve the purpose of comparing the special characters other languages have.

I'm just using the mb_strpos function and looking at the results. I guess that would be as close as you can get to a native comparing of UTF8 strings:

if (mb_strpos(mb_strtolower($search_in), $search_for) !== false) {
    //do stuff
}

Upvotes: -3

thaJeztah
thaJeztah

Reputation: 29137

IMPORTANT

This answer is meant for situations where it's not possible to run/install the 'intl' extension, and only sorts strings by replacing accented characters to non-accented characters. To sort accented characters according to a specific locale, using a Collator is a better approach -- see the other answer to this question for more information.

Sorting by non-accented characters in PHP 5.2

You may try converting both strings to ASCII using iconv() and the //TRANSLIT option to get rid of accented characters;

$str1 = iconv('utf-8', 'ascii//TRANSLIT', $str1);

Then do the comparison

See the documentation here:

http://www.php.net/manual/en/function.iconv.php

[updated, in response to @Esailija's remark] I overlooked the problem of //TRANSLIT translating accented characters in unexpected ways. This problem is mentioned in this question: php iconv translit for removing accents: not working as excepted?

To make the 'iconv()' approach work, I've added a code sample below that strips all non-word characters from the resulting string using preg_replace().

<?php

setLocale(LC_ALL, 'fr_FR');

$names = array(
   'Zoey and another (word) ',
   'Émilie and another word',
   'Amber',
);


$converted = array();

foreach($names as $name) {
    $converted[] = preg_replace('#[^\w\s]+#', '', iconv('UTF-8', 'ASCII//TRANSLIT', $name));
}

sort($converted);

echo '<pre>'; print_r($converted);

// Array
// (
//     [0] => Amber
//     [1] => Emilie and another word
//     [2] => Zoey and another word 
// )

Upvotes: 17

Fabian Schmengler
Fabian Schmengler

Reputation: 24576

There is no native way to do this, however a PECL extension: http://php.net/manual/de/class.collator.php

$c = new Collator('fr_FR');
if ($c->compare('Émily', 'Zoey') < 0) { echo 'Émily < Zoey'; }

Upvotes: 17

Related Questions