Reputation: 2752
I want to split of a text string that might look like this:
(((Hello!
--> (((
and Hello!
or
########No?
--> ########
and No?
At the beginning I have n-times the same special character, but I want to match the longest possible sequence.
What I have at the moment is this regex:
([^a-zA-Z0-9])\\1+([a-zA-Z].*)
This one would return for the first example
(
(only 1 time) and Hello!
and for the second
#
and No!
How do I tell regEx I want the maximal long repetition of the matching character?
I am using RegEx as part of a Java program in case this matters.
Upvotes: 0
Views: 91
Reputation: 626903
I suggest the following solution with 2 regexps: (?s)(\\W)\\1+\\w.*
for checking if the string contains same repeating non-word symbols at the start, and if yes, split with a mere (?<=\\W)(?=\\w)
pattern (between non-word and a word character), else, just return a list containing the whole string (as if not split):
String ptrn = "(?<=\\W)(?=\\w)";
List<String> strs = Arrays.asList("(((Hello!", "########No?", "$%^&^Hello!");
for (String str : strs) {
if (str.matches("(?s)(\\W)\\1+\\w.*")) {
System.out.println(Arrays.toString(str.split(ptrn)));
}else { System.out.println(Arrays.asList(str)); }
}
See IDEONE demo
Result:
[(((, Hello!]
[########, No?]
[$%^&^Hello!]
Also, your original regex can be modified to fit the requirement like this:
String ptrn = "(?s)((\\W)\\2+)(\\w.*)";
List<String> strs = Arrays.asList("(((Hello!", "########No?", "$%^&^Hello!");
for (String str : strs) {
Pattern p = Pattern.compile(ptrn);
Matcher m = p.matcher(str);
if (m.matches()) {
System.out.println(Arrays.asList(m.group(1), m.group(3)));
}
else {
System.out.println(Arrays.asList(str));
}
}
That regex matches:
(?s)
- DOTALL inline modifier (if the string has newline characters, .*
will also match them).((\\W)\\2+)
- Capture group 1 matching and capturing into Group 2 a non-word character followed by the same character (since a backreference \2
is used) 1 or more times.(\\w.*)
- matches and captures into Group 3 a word character and then one or more characters.Upvotes: 1