Reputation: 1576
I encountered a case where I need to split String
per word that is in camel case. I'm implementing the split process similar to the answer this question using this pattern:
split(/(?=[A-Z])/)
Everything is fine until I encountered this test set:
One up to three works fine, but four to six should be "Remittance SPD", "FBI Agent", "FBI Agent NY Department" respectively.
How can I select the regions in such a way that it will treat successive upper case letter as one word and the last of the sequence as the start of the next word? I'm not that fond of one-liner Regex to be honest and I'm losing all hope. I'm planning to perform a brute force loop here, if not only about that performance.
EDIT: I want both words with non-succeeding uppercase letters and those with succeeding uppercase letters to be satisfied with this function, unlike the other questions about splitting strings here on this site.
Upvotes: 2
Views: 70
Reputation: 627082
You may use a matching approach here:
str.match(/[A-Z]+(?![a-z])|[A-Z][a-z]*/g)
See the regex demo
Details:
[A-Z]+(?![a-z])
- 1+ uppercase ASCII letters NOT followed with a lowercase ASCII letter|
- or[A-Z][a-z]*
- an uppercase ASCII letter followed with 0+ lowercase ASCII lettervar ss = ['SalaryGrade','ParentChild','Maintenance','RemittanceSPD','FBIAgent','FBIAgentNYDepartment'];
var rx = /[A-Z]+(?![a-z])|[A-Z][a-z]*/g;
for (var s = 0; s < ss.length; s++) {
console.log("Testing: ", ss[s], "... ");
console.log("Matched: ", JSON.stringify(ss[s].match(rx)));
}
Note that in case of FBIAgent
, the FBI
are only matched with [A-Z]+(?![a-z])
due to the backtracking that gets triggered after the regex engine grabs the FBIA
uppercase letters with [A-Z]+
: it backtracks to the position where the uppercase is not followed with a lowercase letter, and thus, you get FBI
match, and the A
letters remains to be consumed at the next iteration.
Upvotes: 2