Reputation: 166066
In PHP, is there a known safe/reliable way to
I know I could, for a set of specific characters, use mb_ereg_replace
to replace specific characters. However, I'm interested in being able to scan a string that might contain any hebrew character, and then replace it with things.
That is, I might have two strings like this
<?php
$string1 = "Look at this hebrew character: חַ. Isn't it great?";
$string2 = "Look at this other hebrew character: יַָ. It is also great?";
I want a single function that would give me the following strings
Look at this hebrew character: \texthebrew{ח}. Isn't it great?
Look at this other hebrew character: \texthebrew{י}. It is also great?
In theory I know I could scan the string for characters in the hebrew UTF-8 range and detect those, but how character encoding on strings works in PHP has always been a little hazy for me, and I'd rather use a proven/known solution if such a thing exists.
Upvotes: 1
Views: 1559
Reputation: 44
To detect if the string has Hebrew text, use the boolean function mb_ereg_match:
mb_ereg_match('\p{Hebrew}+', $stringtosearch);
Upvotes: 0
Reputation: 197757
The mb_ereg_replace_callback function is useful in your case. The regular expression dialect has support for named properties, the Hebrew property specifically. That is Hewbrew Unicode block (IntlChar::BLOCK_CODE_HEBREW
).
All you need to do is to mask the Hebrew segments:
mbregex_encoding('utf-8');
var_dump(mb_ereg_replace_callback('\p{Hebrew}+', function($matches) {
return vsprintf('\texthebrew{%s}', $matches);
}, $subject));
Output:
string(65) "Look at this hebrew character: \texthebrew{חַ}. Isn't it great?"
As the output shows, the four bytes with the two code-points are properly wrapped in one segment.
I don't know of any other way to do that in PHP with that little code.
Upvotes: 2