Detecting Hebrew Characters in PHP Strings

Question

In PHP, is there a known safe/reliable way to

Detect, generically, a hebrew character that's in a string of plain english characters.
Replace that character with something

I know I could, for a set of specific characters, use mb_ereg_replace to replace specific characters. However, I'm interested in being able to scan a string that might contain any hebrew character, and then replace it with things.

That is, I might have two strings like this



I want a single function that would give me the following strings

Look at this hebrew character: 	exthebrew{ח}. Isn't it great?
Look at this other hebrew character: 	exthebrew{י}. It is also great?


In theory I know I could scan the string for characters in the hebrew UTF-8 range and detect those, but how character encoding on strings works in PHP has always been a little hazy for me, and I'd rather use a proven/known solution if such a thing exists.

hakre · Accepted Answer

The mb_ereg_replace_callback function is useful in your case. The regular expression dialect has support for named properties, the Hebrew property specifically. That is Hewbrew Unicode block (IntlChar::BLOCK_CODE_HEBREW).

All you need to do is to mask the Hebrew segments:

mbregex_encoding('utf-8');
var_dump(mb_ereg_replace_callback('\p{Hebrew}+', function($matches) {
    return vsprintf('	exthebrew{%s}', $matches);
}, $subject));

Output:

string(65) "Look at this hebrew character: 	exthebrew{חַ}. Isn't it great?"

As the output shows, the four bytes with the two code-points are properly wrapped in one segment.

I don't know of any other way to do that in PHP with that little code.

Detecting Hebrew Characters in PHP Strings

Answers (2)

Related Questions