java regular expression: conditionally spilt string by capital letters

Question

I am not familiar with regular expression. Maybe this is a simple problem. Given a string

XYZHelloWorldT

I need to return an string array as

{XYZ Hello World T}

That is, take all the words that start with exactly one capital letter and followed by one or more small letters or multiple capital letters, followed by a capital letter starting a new word. The remaining part is separated by the vacancies to be the other elements in the string array.

I can work on the characters directly. Just wonder whether I could do it by regular expression directly in string's split method? I found something like this Java: Split string when an uppercase letter is found but not sure how to use it to solve my problem. Thanks

ndnenkov · Accepted Answer

Since you can have multiple consecutive upper case letters, you want to have lookbehind for lower case as well as lookahead for upper case:

(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])

If you want support for other languages, you should use posix character classes:

(?<=\p{Lower})(?=\p{Upper})|(?<=\p{Upper})(?=\p{Upper}\p{Lower})

The first alternation will match if you are between lowercase and uppercase letters. The second one - if you are between an upper case and another upper case, followed by lower case.

java regular expression: conditionally spilt string by capital letters

Answers (2)

Related Questions