Reputation: 2377
Is there a way to apply the replace
method on Unicode text in general (Arabic is of concern here)? In the example below, whereas replacing the entire word works nicely on the English text, it fails to detect and as a result, replace the Arabic word. I added the u
as a flag to enable unicode parsing but that didn't help. In the Arabic example below, the word النجوم should be replaced, but not والنجوم, but this doesn't happen.
<!DOCTYPE html>
<html>
<body>
<p>Click to replace...</p>
<button onclick="myFunction()">replace</button>
<p id="demo"></p>
<script>
function myFunction() {
var str = "الشمس والقمر والنجوم، ثم النجوم والنهار";
var rep = 'النجوم';
var repWith = 'الليل';
//var str = "the sun and the stars, then the starsz and the day";
//var rep = 'stars';
//var repWith = 'night';
var result = str.replace(new RegExp("\\b"+rep+"\\b", "ug"), repWith);
document.getElementById("demo").innerHTML = result;
}
</script>
</body>
</html>
And, whatever solution you could offer, please keep it with the use of variables as you see in the code above (the variable rep
above), as these replace words being sought are passed in through function calls.
UPDATE: To try the above code, replace code in here with the code above.
Upvotes: 5
Views: 430
Reputation: 626758
A \bword\b
pattern can be represented as (^|[A-Za-z0-9_])word(?![A-Za-z0-9_])
pattern and when you need to replace the match, you need to add $1
before the replacement pattern.
Since you need to work with Unicode, it makes sense to utilize XRegExp library that supports a "shorthand" \pL
notation for any base Unicode letter. You may replace A-Za-z
in the above pattern with this \pL
:
var str = "الشمس والقمر والنجوم، ثم النجوم والنهار";
var rep = 'النجوم';
var repWith = 'الليل';
var regex = new XRegExp('(^|[^\\pL0-9_])' + rep + '(?![\\pL0-9_])');
var result = XRegExp.replace(str, regex, '$1' + repWith, 'all');
console.log(result);
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>
UPDATE by @mohsenmadi: To integrate in an Angular app, follow these steps:
npm install xregexp
to add the library to package.json
import { replace, build } from 'xregexp/xregexp-all.js';
let regex = build('(^|[^\\pL0-9_])' + rep + '(?![\\pL0-9_])');
let result = replace(str, regex, '$1' + repWith, 'all');
Upvotes: 3
Reputation:
Incase you change your mind about whitespace boundary's, here is the regex.
var Rx = new RegExp(
"(^|[\\u0009-\\u000D\\u0020\\u0085\\u00A0\\u1680\\u2000-\\u200A\\u2028-\\u2029\\u202F\\u205F\\u3000])"
+ text +
"(?![^\\u0009-\\u000D\\u0020\\u0085\\u00A0\\u1680\\u2000-\\u200A\\u2028-\\u2029\\u202F\\u205F\\u3000])"
,"ug");
var result = str.replace( Rx, '$1' + repWith );
Regex explanation
( # (1 start), simulated whitespace boundary
^ # BOL
| # or whitespace
[\u0009-\u000D\u0020\u0085\u00A0\u1680\u2000-\u200A\u2028-\u2029\u202F\u205F\u3000]
) # (1 end)
text # To find
(?! # Whitespace boundary
[^\u0009-\u000D\u0020\u0085\u00A0\u1680\u2000-\u200A\u2028-\u2029\u202F\u205F\u3000]
)
In an engine that can use lookbehind assertions, a whitespace boundary
is typically done like this (?<!\S)text(?!\S)
.
Upvotes: 2