Reputation: 387
I want to implement a simple Arabic to English transliteration. I have defined a mapping array like the following:
$mapping = array('ﺏ' => 'b', 'ﺕ' => 't', ...)
I expect the following code to convert an Arabic string to its corresponding transliteration
$str = "رضي الدين";
$strlen = mb_strlen( $str, "UTF-8" );
for( $i = 0; $i <= $strlen; $i++ ) {
$char = mb_substr( $str, $i, 1, "UTF-8" );
echo bin2hex($char); // 'd8b1' for ﺭ
// echo $mapping["$char"];
}
But $char
does not match the keys. How can this be solved?
The source code is loaded in UTF-8.
EDIT
When I do bin2hex()
on each key of $mapping
I get values different than that I get with corresponding $char
. For example, for ﺭ
I get efbaad
and d8b1
. They obviously don't match and they are not converted.
foreach ($mapping as $k => $v) {
echo $k . ' ' . bin2hex($k) . '<br>'; // 'efbaad' for ﺭ
}
Only 'ي' gets same values and is converted.
I do not know what's the problem!
EDIT2
This chart actually shows that both of these codes refer to ﺭ
Upvotes: 0
Views: 2309
Reputation: 3724
The problem is that you didn't specify the encoding to both mb_strlen()
and mb_substr()
; the following works okay:
$str = "رضي الدين";
$mapping = array('ﺏ' => 'b', 'ﺕ' => 't', 'ر' => c);
$strlen = mb_strlen( $str, "UTF-8" );
for( $i = 0; $i <= $strlen; $i++ ) {
$char = mb_substr( $str, $i, 1 , "UTF-8");
echo $mapping["$char"];
}
Upvotes: 2
Reputation: 37365
I suggest you to use preg
engine since it natively works well with UTF-8. mb_* is not a bad choice, of cause, but I think it's just more complicated.
I've made a sample for your case:
$sData = "رضي الدين";
$rgReplace = [
'ﺏ' => 'b',
'ﺕ' => 't',
'ن' => 'n',
'ي' => 'i',
'د' => 'f',
'ل' => 'l',
'ا' => 'a',
'ر' => 'r',
'ي' => 'i',
'ض' => 'g',
' ' => ' '
];
$sResult = preg_replace_callback('/./u', function($sChar) use ($rgReplace)
{
return $rgReplace[$sChar[0]];
}, $sData);
echo $sResult; //rgi alfin
as for your code - try to pass encoding directly (second parameter in mb_* functions)
Upvotes: 2