Reputation: 1766
I'm trying to compare two string lets say Émilie
and Zoey
. Well E
comes before Z
but on the ASCII chart Z
comes before É
so a normal if ( str1 > str2 )
won't work.
I tried with if (strcmp(str1,str2) > 0)
, but that still doesn't work. So I'm looking into a native way to compare string with UTF-8 characters.
Upvotes: 19
Views: 25739
Reputation: 48071
Directly modifying values is not good as a general use technique. Also, it is a bad idea (inefficient) to perform string mutations (x2) on each iteration of a sorting function.
Improving the script efficiency of Marqitos's answer, make singular mapped calls of iconv()
and sort the original array by that mutated copy.
Code: (Demo)
setLocale(LC_ALL, 'fr_FR');
$names = [
'Zoey and another (word)',
'Émilie and another word',
'Amber'
];
array_multisort(
array_map(fn($v) => iconv('utf-8', 'ascii//TRANSLIT', $v), $names),
$names
);
var_export($names);
Output:
array (
0 => 'Amber',
1 => 'Émilie and another word',
2 => 'Zoey and another (word)',
)
Upvotes: 0
Reputation: 21
I recomend to use the usort
function, to avoid modifying the values, and still compare them correctly.
Example:
<?php
setLocale(LC_ALL, 'fr_FR');
$names = [
'Zoey and another (word)',
'Émilie and another word',
'Amber'
];
function compare(string $a, string $b) {
$a = preg_replace('#[^\w\s]+#', '', iconv('utf-8', 'ascii//TRANSLIT', $a));
$b = preg_replace('#[^\w\s]+#', '', iconv('utf-8', 'ascii//TRANSLIT', $b));
return strcmp($a, $b);
}
usort($names, 'compare');
echo '<pre>';
print_r($names);
echo '</pre>';
with result:
Array
(
[0] => "Amber"
[1] => "Émilie and another word"
[2] => "Zoey and another (word)"
)
Upvotes: 1
Reputation: 3588
Here's something that works for me although I'm not sure if it will serve the purpose of comparing the special characters other languages have.
I'm just using the mb_strpos
function and looking at the results. I guess that would be as close as you can get to a native comparing of UTF8 strings:
if (mb_strpos(mb_strtolower($search_in), $search_for) !== false) {
//do stuff
}
Upvotes: -3
Reputation: 29137
IMPORTANT
This answer is meant for situations where it's not possible to run/install the 'intl' extension, and only sorts strings by replacing accented characters to non-accented characters. To sort accented characters according to a specific locale, using a Collator is a better approach -- see the other answer to this question for more information.
Sorting by non-accented characters in PHP 5.2
You may try converting both strings to ASCII using iconv() and the //TRANSLIT option to get rid of accented characters;
$str1 = iconv('utf-8', 'ascii//TRANSLIT', $str1);
Then do the comparison
See the documentation here:
http://www.php.net/manual/en/function.iconv.php
[updated, in response to @Esailija's remark] I overlooked the problem of //TRANSLIT translating accented characters in unexpected ways. This problem is mentioned in this question: php iconv translit for removing accents: not working as excepted?
To make the 'iconv()' approach work, I've added a code sample below that strips all non-word characters from the resulting string using preg_replace().
<?php
setLocale(LC_ALL, 'fr_FR');
$names = array(
'Zoey and another (word) ',
'Émilie and another word',
'Amber',
);
$converted = array();
foreach($names as $name) {
$converted[] = preg_replace('#[^\w\s]+#', '', iconv('UTF-8', 'ASCII//TRANSLIT', $name));
}
sort($converted);
echo '<pre>'; print_r($converted);
// Array
// (
// [0] => Amber
// [1] => Emilie and another word
// [2] => Zoey and another word
// )
Upvotes: 17
Reputation: 24576
There is no native way to do this, however a PECL extension: http://php.net/manual/de/class.collator.php
$c = new Collator('fr_FR');
if ($c->compare('Émily', 'Zoey') < 0) { echo 'Émily < Zoey'; }
Upvotes: 17