replace/replaceAll with regex on unicode issues

Question

Is there a way to apply the replace method on Unicode text in general (Arabic is of concern here)? In the example below, whereas replacing the entire word works nicely on the English text, it fails to detect and as a result, replace the Arabic word. I added the u as a flag to enable unicode parsing but that didn't help. In the Arabic example below, the word النجوم should be replaced, but not والنجوم, but this doesn't happen.




Click to replace...

And, whatever solution you could offer, please keep it with the use of variables as you see in the code above (the variable rep above), as these replace words being sought are passed in through function calls.

UPDATE: To try the above code, replace code in here with the code above.

Wiktor Stribiżew · Accepted Answer

A \bword\b pattern can be represented as (^|[A-Za-z0-9_])word(?![A-Za-z0-9_]) pattern and when you need to replace the match, you need to add $1 before the replacement pattern.

Since you need to work with Unicode, it makes sense to utilize XRegExp library that supports a "shorthand" \pL notation for any base Unicode letter. You may replace A-Za-z in the above pattern with this \pL:

var str = "الشمس والقمر والنجوم، ثم النجوم والنهار";
var rep = 'النجوم';
var repWith = 'الليل';

var regex = new XRegExp('(^|[^\pL0-9_])' + rep + '(?![\pL0-9_])');
var result = XRegExp.replace(str, regex, '$1' + repWith, 'all');
console.log(result);

UPDATE by @mohsenmadi: To integrate in an Angular app, follow these steps:

Issue an npm install xregexp to add the library to package.json
Inside a component, add an import { replace, build } from 'xregexp/xregexp-all.js';
Build the regex with: let regex = build('(^|[^\pL0-9_])' + rep + '(?![\pL0-9_])');
Replace with: let result = replace(str, regex, '$1' + repWith, 'all');

replace/replaceAll with regex on unicode issues

Answers (2)

Related Questions