Gandalf StormCrow
Gandalf StormCrow

Reputation: 26202

Issues with regex match, too many matches

I have a three regexes, one of each should match their pattern, but for now they match more than their own patter :

1. Input: test 1-2-22
regex ^([a-z|A-Z|\s]*)(\d*)-(\d*)-(\d*)$
I want to capture "test", "1", "2" and "22" in groups

2. Input: ooi 4-11-58 test^two^ one 1 two
regex ^([a-z|A-Z|\s]*)(\d*)-(\d*)-(\d*)(.+)$
I want to capture "ooi", "4", "11", "58", "test^two^ one 1 two" in groups

3. Input: one two three 3-11 four and five T1 F
regex ^([a-z|A-Z|\s]*)(\d*)-(\d*)(.+)$
I want to capture "one two three", "3", "11", "four and five T1 F" in groups

I'm applying each regex on each input string and it should only pass one of those.

What happens now is that regex 1 is match for both 2 and 3 and regex 2 matches the one and all of them each with one another.

How can I correct regexes so each match only their own pattern?

Upvotes: 0

Views: 153

Answers (2)

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726559

Dot . is too permissive, it would capture anything - including dashes and digits. That is why your third expression matches all three - the .+ in it matches "-58 test^two^ one 1 two" text of the second pattern, or "-21" of the first pattern.

You need to add some markers to your regex to distinguish between the patterns. For example, you could modify your #3 to say that the first character of the .+ must be something other than a dash or a digit, like this:

^([a-zA-Z\s]*)(\d*)-(\d*)([^\d-].*)$

Note the [^\d-] group that I added. It says that the first character in what has been a .+ in your expression must not be a digit or a dash. This would prevent the #3 from capturing a #1 or #2.

Also note that I removed the vertical bars inside the character class, because it is interpreted literally inside square brackets.

Upvotes: 1

Steve P.
Steve P.

Reputation: 14699

String regex_0 = "^([a-zA-Z]+)\\s+(\\d+)-(\\d+)-(\\d+)$";

String regex_1 = "^([a-zA-Z]+)\\s+(\\d+)-(\\d+)-(\\d+)\\s+([a-zA-Z0-9\\s]+)$"

String regex_2 = "^([a-zA-Z\\s]+)(\\d+)-(\\d+)\\s+([a-zA-Z0-9\\s]+)$"

Note: [a|b] as a character class does not mean "a or b", it means "a or b or |".

Also, not sure if you actually want * or not, as they mean any amount, from what I can tell, it seems like you want +, which means one or more.

Upvotes: 1

Related Questions