Reputation: 31
I want to capture tokens in a text in the following pattern:
The First 2 characters are alphabets and necessary, ends with [A-Z] or [A-Z][0-9] this is optional anything can come in between.
example:
AA123123A1
AA123123A
AA123123123
i want to match and capture start with ([A-Z][A-Z]) in group 1, end with [A-Z] or [A-Z][0-9] in group 3 and everything else between then in group2
Example:
AA123123A1 => [AA,123123,A1]
AA123123A. => [AA,123123,A]
AA123123123 => [AA,123123123,'']
the following regex is working in python but not in postgres.
regex='^([A-Za-z]{2})((?:.+)(?=[A-Za-z][0-9]{0,1})|(?:.*))([A-Za-z][0-9]{0,1}){0,1}$'
In Postgressql
select regexp_matches('AA2311121A1',
'^([A-Za-z]{2})((?:.+)(?=[A-Za-z][0-9]{0,1})|(?:.*))(.*)$','x');
result:
{AA,2311121A1,""}
I am trying to explore why positive lookahead behavior is not the same as python, and how to take make positive lookahead in Postgres work in this case.
Upvotes: 1
Views: 179
Reputation: 627263
You can use
^([A-Za-z]{2})(.*?)([A-Za-z][0-9]?)?$
See the regex demo and a DB fiddle online:
Details:
^
- start of string([A-Za-z]{2})
- two ASCII letters(.*?)
- Group 2: any zero or more chars as few as possible([A-Za-z][0-9]?)?
- Group 3: an optional sequence of an ASCII letter and then an optional digit$
- end of string.Upvotes: 1