Sagive
Sagive

Reputation: 1827

After declaring an array of translations including Hebrew characters, str_replace() does not replace substrings as expected

I have an array with Hebrew words that I need to find in a string and replace. The translation array is defined as key-value pairs.

In English, this works.

   function replace_twophrase_words($string) {

    $string = strtolower($string);

    $replacements = array (
        'movers'                =>  'Moving Services',
        'home-moving'           =>  'Home Moving',
        'commercial-moving'     =>  'office-moving',
    );


    $string = str_replace($replacements, array_keys($replacements), $string);
}

Hebrew array (asked in comments):

$replacements = array (
    'עיצוב-פנים'            =>  'עיצוב פנים',
    'עיצוב-פנים'            =>  'מעצבת פנים',
    'עיצוב-פנים'            =>  'עיצוב משרדים',
);

But... it seems that this doesn't work in Hebrew at all. Can anyone explain what's gone wrong?

Upvotes: 0

Views: 568

Answers (2)

mickmackusa
mickmackusa

Reputation: 47902

The thing that has tripped you up is not the behavior of native PHP functions, but in the way that you've defined your lookup array containing Right-To-Left characters.

Your array is declared as:

$replacements = array (
    'עיצוב-פנים'  =>  'עיצוב פנים',
    'עיצוב-פנים'  =>  'מעצבת פנים',
    'עיצוב-פנים'  =>  'עיצוב משרדים',
);

Notice that with the RTL strings, the direction of the => is also affected. If you print it out with var_export(), you get:

array (
  'עיצוב-פנים' => 'עיצוב משרדים',
)

You see the three identical values on the right side of each line are actually the keys and PHP does not let you have duplicate keys on a single level of an array. So your data not only collapses, it doesn't work as intended for the replacement task.


To correct your data, reverse the position of your intended keys and values. [value <= key, ...]

$replacements = array(
    'עיצוב פנים' => 'עיצוב-פנים',
    'מעצבת פנים' => 'עיצוב-פנים',
    'עיצוב משרדים' => 'עיצוב-פנים',
);

Then echo strtr($input, $replacements); will work exactly as desired.

str_replace() will also work but you'll need to isolate the keys as an array of values and pass the find and replacement parameters separately. Because all of your replacements are the same, if you are going to use str_replace(), you could just write the single string value as the 2nd parameter.

Here is a demo of everything discussed above. https://3v4l.org/pms5Z

Upvotes: 0

Kohjah Breese
Kohjah Breese

Reputation: 4136

You need to use mb_ereg_replace:

setlocale( LC_CTYPE, 'en_US.UTF-8' );
header( 'Content-Type: text/plain; charset=utf-8' );
echo mb_ereg_replace( 'עיצוב-פנים', 'עיצוב פנים', 'Hebrew string is here: עיצוב-פנים ; and back to Latin.' );

This is because Hebrew letters are composed of multiple bytes and str_reaplce and preg_replace (by default) do not understand these.

Upvotes: 0

Related Questions