Reputation: 17023
I know JavaScript regular expressions have native lookaheads but not lookbehinds.
I want to split a string at points either beginning with any member of one set of characters or ending with any member of another set of characters.
Split before ເ
, ແ
, ໂ
, ໃ
, ໄ
. Split after ະ
.
In: ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູດ
Out: ເລື້ອຍໆມະ ຫັດສະ ຈັນ ເອກອັກຄະ ລັດຖະ ທູດ
I can achieve the "split before" part using zero-width lookahead:
'ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູດ'.split(/(?=[ໃໄໂເແ])/)
["ເລື້ອຍໆມະຫັດສະຈັນ", "ເອກອັກຄະລັດຖະທູດ"]
But I can't think of a general approach to simulating zero-width lookbehind
I'm splitting strings of arbitrary Unicode text so don't want to substitute in special markers in a first pass, since I can't guarantee the absence of any string from my input.
Upvotes: 1
Views: 364
Reputation: 70750
Instead of split
ing, you may consider using the match()
method.
var s = 'ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູດ',
r = s.match(/(?:(?!ະ).)+?(?:ະ|(?=[ໃໄໂເແ]|$))/g);
console.log(r); //=> [ 'ເລື້ອຍໆມະ', 'ຫັດສະ', 'ຈັນ', 'ເອກອັກຄະ', 'ລັດຖະ', 'ທູດ' ]
Upvotes: 3
Reputation: 95385
If you use parentheses in the delimited regex, the captured text is included in the returned array. So you can just split on /(ະ)/
and then concatenate each of the odd members of the resulting array to the preceding even member. Example:
"ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູ".split(/(ະ)/).reduce(function(arr,str,index) {
if (index%2 == 0) {
arr.push(str);
} else {
arr[arr.length-1] += str
};
return arr;
},[])
Result: ["ເລື້ອຍໆມະ", "ຫັດສະ", "ຈັນເອກອັກຄະ", "ລັດຖະ", "ທູ"]
You can do another pass to split on the lookahead:
"ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູ".split(/(ະ)/).reduce(function(arr,str,index) {
if (index%2 == 0) {
arr.push(str);
} else {
arr[arr.length-1] += str
};
return arr;
},[]).reduce(function(arr,str){return arr.concat(str.split(/(?=[ໃໄໂເແ])/));},[]);
Result: ["ເລື້ອຍໆມະ", "ຫັດສະ", "ຈັນ", "ເອກອັກຄະ", "ລັດຖະ", "ທູ"]
Upvotes: 1
Reputation: 174874
You could try matching rather than splitting,
> var re = /((?:(?!ະ).)+(?:ະ|$))/g;
undefined
> var str = "ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູດ"
undefined
> var m;
undefined
> while ((m = re.exec(str)) != null) {
... console.log(m[1]);
... }
ເລື້ອຍໆມະ
ຫັດສະ
ຈັນເອກອັກຄະ
ລັດຖະ
ທູດ
Then again split the elements in the array using lookahead.
Upvotes: 1