Reputation: 601
I am not familiar with regular expression. Maybe this is a simple problem. Given a string
XYZHelloWorldT
I need to return an string array as
{XYZ Hello World T}
That is, take all the words that start with exactly one capital letter and followed by one or more small letters or multiple capital letters, followed by a capital letter starting a new word. The remaining part is separated by the vacancies to be the other elements in the string array.
I can work on the characters directly. Just wonder whether I could do it by regular expression directly in string's split method? I found something like this Java: Split string when an uppercase letter is found but not sure how to use it to solve my problem. Thanks
Upvotes: 1
Views: 279
Reputation: 466
This is algorithm in Java for find this words, but only not recommend for big texts, also not includes numbers and whitespace.
public class TestString
{
static int i = 0, lenght;
static char array[];
public static void main(String[] args){
String result = "XYZHelloWorldTRTTTePoPoIiiiiiooY";
array = result.toCharArray();
lenght=array.length;
StringBuffer words = new StringBuffer();
for(; i< lenght; i++){
words.append(makeArray());
}
String resultOut[]= words.toString().split(",");
for(String key: resultOut){
System.out.println(key);
}
System.exit(0);
}
private static String makeArray()
{
StringBuffer word = new StringBuffer();
String upper, normal;
boolean lower=false;
for(; i< lenght; ++i){
word.append(array[i]);
if(i<lenght-2){
upper=String.valueOf(array[i+1]).toUpperCase();
normal=String.valueOf(array[i+1]);
if(upper.equals(normal)){
upper=String.valueOf(array[i+2]).toUpperCase();
normal=String.valueOf(array[i+2]);
if(upper.equals(normal)){
if(lower){
break;
}
continue;
}else{
break;
}
}else{
lower=true;
continue;
}
}else{
if(lower && i<lenght-1){
String lowerStr=String.valueOf(array[i+1]).toLowerCase();
normal=String.valueOf(array[i+1]);
if(lowerStr.equals(normal)){
continue;
}else{
break;
}
}
break;
}
}
word.append(",");
return word.toString();
}
}
what's your plan to use this regex in my algorithm?
Upvotes: 0
Reputation: 36100
Since you can have multiple consecutive upper case letters, you want to have lookbehind for lower case as well as lookahead for upper case:
(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])
If you want support for other languages, you should use posix character classes:
(?<=\\p{Lower})(?=\\p{Upper})|(?<=\\p{Upper})(?=\\p{Upper}\\p{Lower})
The first alternation will match if you are between lowercase and uppercase letters. The second one - if you are between an upper case and another upper case, followed by lower case.
Upvotes: 3