tpdietz
tpdietz

Reputation: 1368

Javascript regex to replace word that may contain accent characters

I've been trying come up with a regex that will replace a word that may or may not contain accent characters. I've been researching this for the past couple days, but cannot find the information I need to solve my problem.

I had come up with a simple regex that handles words without accent characters great:

var re = new RegExp('(?:\\b)hello(?:\\b)', 'gm');
var string = 'hello hello hello world hellos hello';
string.replace(re, "FOO");

Result: FOO FOO FOO world hellos FOO

The above works as I want. The problem with the above code, is when the word contains an accent character as the first, or last character in the string. Example:

var re = new RegExp('(?:\\b)helló(?:\\b)', 'gm');
var string = 'helló helló helló world hellós helló';
string.replace(re, "FOO");

Result: helló helló helló world FOOs helló

Desired result: FOO FOO FOO world hellós FOO

From my understanding, the above is occurring because an accented character is interpreted as a boundary. My attempt at solving the problem (note: the range [A-zÀ-ÿ] is what I consider the valid alphabet to construct a word):

var re = new RegExp('([^A-zÀ-ÿ]|^)helló([^A-zÀ-ÿ]|$)', 'gm');
var string = 'helló helló helló world hellós helló';
string.replace(re, "$1FOO$2");

Result: FOO helló FOO world hellós FOO

As you can see, I'm much closer to the desired result. However, the problem occurs when the word in question appears three or more times in a row. Please note the second occurrence of helló was ignored. I believe that's because the whitespace preceding it was already matched by the first occurence of helló.

Does anybody have any suggestions on how to achieve FOO FOO FOO world hellós FOO?

Upvotes: 0

Views: 1797

Answers (1)

DaCrazyCoder
DaCrazyCoder

Reputation: 51

The answer is a little complex, but has been answered in the following as to why you are struggling on this issue: Why can't I use accented characters next to a word boundary?

However, given the lack of good unicode support in Javascript, especially before ECMAScript 6 (I've had this issue myself in the past). I have found that it is often better to use a third party library with better unicode support such as: http://xregexp.com/

This also eliminates some of the variances in support from older browsers.

Upvotes: 2

Related Questions