Reputation: 81
So I'm trying to split a string with a regex and the split function in java. The regex should split the string when there is a capital letter after a noncapital letter like this
hHere // -> should split to ["h", "Here"]
I'm trying to split a string like this
String str = "1. Test split hHere and not .Here and /Here";
String[] splitString = str.split("(?=\\w+)((?=[^\\s])(?=\\p{Upper}))");
/* print splitString */
// -> should split to ["1. Test split h", "Here and not .Here and not /Here"]
for(String s : splitString) {
System.out.println(s);
}
output I get
1.
Test split h
Here and not .
Here and /
Here
output I want
1. Test split h
Here and not .Here and not /Here
Just can't figure out the regex to do this
Upvotes: 0
Views: 70
Reputation: 22837
As per my original comment.
This option works with ASCII characters (it will not work for Unicode characters). Basically, this works with English text.
(?<=[a-z])(?=[A-Z])
This option works with Unicode characters. This works with any language.
(?<=\p{Ll})(?=\p{Lu})
(?<=[a-z])
Positive lookbehind ensuring what precedes is a character in the set a-z
(lowercase ASCII character)(?=[A-Z])
Positive lookahead ensuring what follows is a character in the set A-Z
(uppercase ASCII character)(?<=\p{Ll})
Positive lookbehind ensuring what precedes is a character in the set \p{Ll}
(lowercase letter Unicode property/script category)(?=\p{Lu})
Positive lookahead ensuring what follows is a character in the set \p{Lu}
(uppercase letter Unicode property/script category)Upvotes: 1
Reputation: 54168
You may use a easier pattern : (?<=\p{Ll})(?=\p{Lu})
(?<= )
ensures that the given pattern will match, ending at the current position in the expression.(?= )
asserts that the given subpattern can be matched here, without consuming characters
both does not consume any characters, very important !
str.split("(?<=[a-z])(?=[A-Z])");
old version does not work for other alphabet
Upvotes: 2