Reputation: 51
I am trying to create a regex for an ID with the following rules:
The delimiters can be omitted if the ID is alternating alpha and numeric (A-01a1, A1.a.1). Delimiters is required if succeeding parts are both alpha or both numeric (A-1.1a, A1.2.3, A1a.a).
Here is what I have:
(?P<mi>[A-Z]+)-?(?P<si>[0-9]+)[\-\.]?(?P<mc>[a-z0-9])*[\-\.]?(?P<sc>[a-z0-9])*
Here is the result when I tried it:
ID mi si mc sc
A1 A 1
A001 A 001
AB-01 AB 01
A1aa A 1 a <<<<< mc=aa
A-01a1 A 01 1 <<<<< mc=a sc=1
A-1.1a A 1 a <<<<< mc=1 sc=a
A1.a1 A 1 1 <<<<< mc=a sc=1
A1.a.1 A 1 a 1
A1.2.3 A 1 2 3
A1a.a A 1 a a
Upvotes: 1
Views: 158
Reputation: 15010
The *
in your expression should be relocated to the inside of your capture groups
Also you can remove the slashes inside the character case
(?P<mi>[A-Z]+)-?(?P<si>[0-9]+)[\-\.]?(?P<mc>[a-z0-9])*[\-\.]?(?P<sc>[a-z0-9])*
^ ^ ^ ^ ^ ^
Should look like:
(?P<mi>[A-Z]+)-?(?P<si>[0-9]+)[-.]?(?P<mc>[a-z0-9]*)[-.]?(?P<sc>[a-z0-9]*)
Upvotes: 0
Reputation: 15010
(?<=&|^)xxx=true^(?P<MainID>[a-z]+)-?(?<SubID>[0-9]+)(?:[-.]?(?P<MainCategory>(?<=[-.])[a-z0-9]+(?=[-.\s])|[a-z]+|[0-9]+))?(?:[-.]?(?P<SubCategory>(?<=[-.])[a-z0-9]+(?=[-.\s])|[a-z]+|[0-9]+))?
** To see the image better, simply right click the image and select view in new window
The regex does the following:
Followed with an optional a-z or 0-9, one or more times. (Sub category, sc)
If a group of text is surrounded by delimiters or the end of the string then the characters are allowed to alternate between letters and numbers for the same capture group
If the string is not surrounded by delimiters then the only letters or numbers are allowed to be captured
Live Demo
https://regex101.com/r/uH7zF3/1
Sample text
ID mi si mc sc
A1 A 1
A001 A 001
AB-01 AB 01
A1aa A 1 a <<<<< mc=aa
A-01a1 A 01 1 <<<<< mc=a sc=1
A-1.1a A 1 a <<<<< mc=1 sc=a
A1.a1 A 1 1 <<<<< mc=a sc=1
A1.a.1 A 1 a 1
A1.2.3 A 1 2 3
A1a.a A 1 a a
Sample Matches
MATCH 1
MainID [24-25] `A`
SubID [25-26] `1`
MATCH 2
MainID [38-39] `A`
SubID [39-42] `001`
MATCH 3
MainID [54-56] `AB`
SubID [57-59] `01`
MATCH 4
MainID [69-70] `A`
SubID [70-71] `1`
MainCategory [71-73] `aa`
MATCH 5
MainID [104-105] `A`
SubID [106-108] `01`
MainCategory [108-109] `a`
SubCategory [109-110] `1`
MATCH 6
MainID [143-144] `A`
SubID [145-146] `1`
MainCategory [147-149] `1a`
MATCH 7
MainID [182-183] `A`
SubID [183-184] `1`
MainCategory [185-187] `a1`
MATCH 8
MainID [221-222] `A`
SubID [222-223] `1`
MainCategory [224-225] `a`
SubCategory [226-227] `1`
MATCH 9
MainID [243-244] `A`
SubID [244-245] `1`
MainCategory [246-247] `2`
SubCategory [248-249] `3`
MATCH 10
MainID [265-266] `A`
SubID [266-267] `1`
MainCategory [267-268] `a`
SubCategory [269-270] `a`
^ assert position at start of a line
(?P<MainID>[a-z]+) Named capturing group MainID
[a-z]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
a-z a single character in the range between a and z (case insensitive)
-? matches the character - literally
Quantifier: ? Between zero and one time, as many times as possible, giving back as needed [greedy]
(?<SubID>[0-9]+) Named capturing group SubID
[0-9]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
0-9 a single character in the range between 0 and 9
(?:[-.]?(?P<MainCategory>(?<=[-.])[a-z0-9]+(?=[-.\s])|[a-z]+|[0-9]+))? Non-capturing group
Quantifier: ? Between zero and one time, as many times as possible, giving back as needed [greedy]
[-.]? match a single character present in the list below
Quantifier: ? Between zero and one time, as many times as possible, giving back as needed [greedy]
-. a single character in the list -. literally
(?P<MainCategory>(?<=[-.])[a-z0-9]+(?=[-.\s])|[a-z]+|[0-9]+) Named capturing group MainCategory
1st Alternative: (?<=[-.])[a-z0-9]+(?=[-.\s])
(?<=[-.]) Positive Lookbehind - Assert that the regex below can be matched
[-.] match a single character present in the list below
-. a single character in the list -. literally
[a-z0-9]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
a-z a single character in the range between a and z (case insensitive)
0-9 a single character in the range between 0 and 9
(?=[-.\s]) Positive Lookahead - Assert that the regex below can be matched
[-.\s] match a single character present in the list below
-. a single character in the list -. literally
\s match any white space character [\r\n\t\f ]
2nd Alternative: [a-z]+
[a-z]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
a-z a single character in the range between a and z (case insensitive)
3rd Alternative: [0-9]+
[0-9]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
0-9 a single character in the range between 0 and 9
(?:[-.]?(?P<SubCategory>(?<=[-.])[a-z0-9]+(?=[-.\s])|[a-z]+|[0-9]+))? Non-capturing group
Quantifier: ? Between zero and one time, as many times as possible, giving back as needed [greedy]
[-.]? match a single character present in the list below
Quantifier: ? Between zero and one time, as many times as possible, giving back as needed [greedy]
-. a single character in the list -. literally
(?P<SubCategory>(?<=[-.])[a-z0-9]+(?=[-.\s])|[a-z]+|[0-9]+) Named capturing group SubCategory
1st Alternative: (?<=[-.])[a-z0-9]+(?=[-.\s])
(?<=[-.]) Positive Lookbehind - Assert that the regex below can be matched
[-.] match a single character present in the list below
-. a single character in the list -. literally
[a-z0-9]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
a-z a single character in the range between a and z (case insensitive)
0-9 a single character in the range between 0 and 9
(?=[-.\s]) Positive Lookahead - Assert that the regex below can be matched
[-.\s] match a single character present in the list below
-. a single character in the list -. literally
\s match any white space character [\r\n\t\f ]
2nd Alternative: [a-z]+
[a-z]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
a-z a single character in the range between a and z (case insensitive)
3rd Alternative: [0-9]+
[0-9]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
0-9 a single character in the range between 0 and 9
Upvotes: 2
Reputation: 56212
I would use this one:
(?<mi>[A-Z]+)-?(?<si>[0-9]+)[-.]?(?<mc>[a-z0-9]*)[-.]?(?<sc>[a-z0-9]*)
Upvotes: 0