Reputation: 1143
i'm using the following function to highlight certain word and it works fine in english
function highlight(str,toBeHighlightedWord)
{
toBeHighlightedWord="(\\b"+ toBeHighlightedWord.replace(/([{}()[\]\\.?*+^$|=!:~-])/g, "\\$1")+ "\\b)";
var r = new RegExp(toBeHighlightedWord,"igm");
str = str.replace(/(>[^<]+<)/igm,function(a){
return a.replace(r,"<span color='red' class='hl'>$1</span>");
});
return str;
}
but it dose not for Arabic text
so how to modify the regex to match Arabic words also Arabic words with tashkel, where tashkel is a characters added between the original characters example: "محمد" this without tashkel "مُحَمَّدُ" with tashkel the tashkel the decoration of the word and these little marks are characters
Upvotes: 8
Views: 802
Reputation: 89567
In Javascript, you can use the word boundary \b
only with these characters: [a-zA-Z0-9_]
. A lookbehind assertion can not be useful too here since this feature is not supported by Javascript.
The way to solve the problem and "emulate" a kind of word boundary is to use a negated character class with the characters you want to highlight (since it is a negated character class, it will match characters that can't be part of the word.) in a capturing group for the left boundary. For the right a negative lookahead will be much simple.
toBeHighlightedWord="([^\\w\\u0600-\\u06FF\\uFB50-\\uFDFF\\uFE70-\\uFEFF]|^)("
+ toBeHighlightedWord.replace(/([{}()[\]\\.?*+^$|=!:~-])/g, "\\$1")
+ ")(?![\\w\\u0600-\\u06FF\\uFB50-\\uFDFF\\uFE70-\\uFEFF])";
var r = new RegExp(toBeHighlightedWord, "ig");
str = str.replace(/(>[^<]+<)/g, function(a){
return a.replace(r, "$1<span color='red' class='hl'>$2</span>");
}
Character ranges that are used here come from three blocks of the unicode table:
Note that the use of a new capturing group changes the replacement pattern.
Upvotes: 6