Gideon
Gideon

Reputation: 1576

Split words on desired regions

I encountered a case where I need to split String per word that is in camel case. I'm implementing the split process similar to the answer this question using this pattern:

split(/(?=[A-Z])/)

Everything is fine until I encountered this test set:

One up to three works fine, but four to six should be "Remittance SPD", "FBI Agent", "FBI Agent NY Department" respectively.

How can I select the regions in such a way that it will treat successive upper case letter as one word and the last of the sequence as the start of the next word? I'm not that fond of one-liner Regex to be honest and I'm losing all hope. I'm planning to perform a brute force loop here, if not only about that performance.

EDIT: I want both words with non-succeeding uppercase letters and those with succeeding uppercase letters to be satisfied with this function, unlike the other questions about splitting strings here on this site.

Upvotes: 2

Views: 70

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

You may use a matching approach here:

str.match(/[A-Z]+(?![a-z])|[A-Z][a-z]*/g)

See the regex demo

Details:

  • [A-Z]+(?![a-z]) - 1+ uppercase ASCII letters NOT followed with a lowercase ASCII letter
  • | - or
  • [A-Z][a-z]* - an uppercase ASCII letter followed with 0+ lowercase ASCII letter

var ss = ['SalaryGrade','ParentChild','Maintenance','RemittanceSPD','FBIAgent','FBIAgentNYDepartment'];
var rx = /[A-Z]+(?![a-z])|[A-Z][a-z]*/g; 
for (var s = 0; s < ss.length; s++) { 
  console.log("Testing: ", ss[s], "... ");
  console.log("Matched: ", JSON.stringify(ss[s].match(rx)));
}

Note that in case of FBIAgent, the FBI are only matched with [A-Z]+(?![a-z]) due to the backtracking that gets triggered after the regex engine grabs the FBIA uppercase letters with [A-Z]+: it backtracks to the position where the uppercase is not followed with a lowercase letter, and thus, you get FBI match, and the A letters remains to be consumed at the next iteration.

Upvotes: 2

FrenchMajesty
FrenchMajesty

Reputation: 1149

The following should help:

/(?=[A-Z][a-z])/

Upvotes: -1

Related Questions