Reputation: 13
The task at hand: I am trying to prepare for printing a specific list of automatically generated ids. They are of the format aa-bb-cc-dd-ee-ff-gg... every tuple is selectable by [a-zA-Z0-9]+ (of indeterminate length), the delimiters are [-] (max one).
There are anywhere between one and nine tuples in every id. If the id is 3 tuples or less, I would return one group. If the id is more than 3 tuples (4+) then I would return two groups, the first one being composed of 3 tuples, and the second one of the rest.
Only one string would be treated at a time. Here is the test set:
one1
one1-two2
one1-two2-three3
one1-two2-three3-4a
one1-two2-three3-4a-5a
one1-two2-three3-4a-5a-6a
one1-two2-three3-4a-5a-6a-7a
Concretely that would mean:
one1 -> {"one1"}
one1-two2 -> {"one1-two2"}
one1-two2-three3 -> {"one1-two2-three3"}
one1-two2-three3-4a -> {"one1-two2-three3", "4a"}
one1-two2-three3-4a-5a -> {"one1-two2-three3", "4a-5a"}
one1-two2-three3-4a-5a-6a -> {"one1-two2-three3", "4a-5a-6a"}
one1-two2-three3-4a-5a-6a-7a -> {"one1-two2-three3", "4a-5a-6a-7a"}
Work done up until now (this always properly selects the first group)
(^[a-zA-Z0-9]+$)|(^[a-zA-Z0-9]+[-][a-zA-Z0-9]+$)|(^[a-zA-Z0-9]+[-][a-zA-Z0-9]+[-][a-zA-Z0-9]+$)|(^[a-zA-Z0-9]+[-][a-zA-Z0-9]+[-][a-zA-Z0-9]+)
What I am trying to achieve: start at the end of the capture group, check if it is not the end of the line, start reading after the first '-' char following that point, match until the end of the line.
Additional information : I am using Java's native regex engine.
Upvotes: 1
Views: 54
Reputation: 159135
The following regex will match only valid strings, and returns 2 capture groups:
([a-zA-Z0-9]+(?:-[a-zA-Z0-9]+){0,2})(?:-([a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)*))?
Explanation
( Start capture of group 1:
[a-zA-Z0-9]+ Match first tuple of group 1
(?:-[a-zA-Z0-9]+){0,2} Match 0-2 delimiter+tuple pairs for a total of 1-3 tuples
) End capture of group 1
(?: Optional:
- Match delimiter initiating group 2
( Start capture of group 2:
[a-zA-Z0-9]+ Match first tuple of group 2
(?:-[a-zA-Z0-9]+)* Match 0+ delimiter+tuple pairs for a total of 1+ tuples
) End capture of group 2
)? End optional
Demo
public static void main(String... args) {
test("one1",
"one1-two2",
"one1-two2-three3",
"one1-two2-three3-4a",
"one1-two2-three3-4a-5a",
"one1-two2-three3-4a-5a-6a",
"one1-two2-three3-4a-5a-6a-7a",
"one1-two2-three3-4a-5a-6a-7a-8a",
"one1_two2"); // fail: invalid character
}
private static void test(String... values) {
Pattern p = Pattern.compile("([a-zA-Z0-9]+(?:-[a-zA-Z0-9]+){0,2})(?:-([a-zA-Z0-9]+(?:-[a-zA-Z0-9]+){0,3}))?");
for (String value : values) {
Matcher m = p.matcher(value);
if (! m.matches())
System.out.printf("%s -> NO MATCH%n", value);
else if (m.start(2) == -1) // capture group 2 not found
System.out.printf("%s -> {\"%s\"}%n", value, m.group(1));
else
System.out.printf("%s -> {\"%s\", \"%s\"}%n", value, m.group(1), m.group(2));
}
}
Output
one1 -> {"one1"}
one1-two2 -> {"one1-two2"}
one1-two2-three3 -> {"one1-two2-three3"}
one1-two2-three3-4a -> {"one1-two2-three3", "4a"}
one1-two2-three3-4a-5a -> {"one1-two2-three3", "4a-5a"}
one1-two2-three3-4a-5a-6a -> {"one1-two2-three3", "4a-5a-6a"}
one1-two2-three3-4a-5a-6a-7a -> {"one1-two2-three3", "4a-5a-6a-7a"}
one1-two2-three3-4a-5a-6a-7a-8a -> {"one1-two2-three3", "4a-5a-6a-7a-8a"}
one1_two2 -> NO MATCH
Upvotes: 0
Reputation: 48741
You don't need to over-complicate things to get around the problem:
(?m)^(\w+(?:-\w+){0,2})(?:-(\w+(?:-\w+)*))?$
(?m)
enables multiline flag which makes ^
and $
anchors to match beginning and end of each line respectively. A match starts with matching word characters \w+
then up to two more of -\w+
patterns which builds first capturing group information.
Second capturing group contains whatever comes after. If you are sure about formatting you could do this too:
(?m)^(\w+(?:-\w+){0,2})(.+)?$
Test it on live demo
Upvotes: 1