nathanbweb
nathanbweb

Reputation: 717

match hebrew character at word boundary via regex in javascript?

I'm able to match and highlight this Hebrew letter in JS:

var myText = $('#text').html();
var myHilite = myText.replace(/(\u05D0+)/g,"<span class='highlight'>$1</span>");
$('#text').html(myHilite);

fiddle

but can't highlight a word containing that letter at a word boundary:

/(\u05D0)\b/g

fiddle

I know that JS is bad at regex with Unicode (and server side is preferred), but I also know that I'm bad at regex. Is this a limit in JS or an error in my syntax?

Upvotes: 4

Views: 1718

Answers (2)

TBE
TBE

Reputation: 1133

What about using the following regexp which uses all cases of a word in a sentence:

/^u05D0\s|\u05D0$|\u05D0\s|^\u05D0$/

it actually uses 4 regexps with the OR operator ('|').

  1. Either the string starts with your exact word followed by a space
  2. OR your string has space + your word + space
  3. OR your string ends with space + your word
  4. OR your string is the exact word only.

Upvotes: 0

mrk
mrk

Reputation: 5117

I can't read Hebrew... does this regex do what you want?

/(\S*[\u05D0]+\S*)/g

Your first regex, /(\u05D0+)/g matches on only the character you are interested in.

Your second regex, /(\u05D0)\b/g, matches only when the character you are interested in is the last-only (or last-repeated) character before a word boundary...so that doesn't won't match that character in the beginning or middle of a word.

EDIT:

Look at this anwer

utf-8 word boundary regex in javascript

Using the info from that answer, I come up with this regex, is this correct?

/([\u05D0])(?=\s|$)/g

Upvotes: 2

Related Questions