YOGENDRA SONI
YOGENDRA SONI

Reputation: 31

postgres regex positive lookahead is not working as expected

I want to capture tokens in a text in the following pattern:

The First 2 characters are alphabets and necessary, ends with [A-Z] or [A-Z][0-9] this is optional anything can come in between.

example:

AA123123A1

AA123123A

AA123123123

i want to match and capture start with ([A-Z][A-Z]) in group 1, end with [A-Z] or [A-Z][0-9] in group 3 and everything else between then in group2

Example:

AA123123A1 => [AA,123123,A1]

AA123123A. => [AA,123123,A]

AA123123123 => [AA,123123123,'']

the following regex is working in python but not in postgres.

regex='^([A-Za-z]{2})((?:.+)(?=[A-Za-z][0-9]{0,1})|(?:.*))([A-Za-z][0-9]{0,1}){0,1}$'

In Postgressql

    select regexp_matches('AA2311121A1',
    '^([A-Za-z]{2})((?:.+)(?=[A-Za-z][0-9]{0,1})|(?:.*))(.*)$','x');

result:

{AA,2311121A1,""}

I am trying to explore why positive lookahead behavior is not the same as python, and how to take make positive lookahead in Postgres work in this case.

Upvotes: 1

Views: 179

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627263

You can use

^([A-Za-z]{2})(.*?)([A-Za-z][0-9]?)?$

See the regex demo and a DB fiddle online:

enter image description here

Details:

  • ^ - start of string
  • ([A-Za-z]{2}) - two ASCII letters
  • (.*?) - Group 2: any zero or more chars as few as possible
  • ([A-Za-z][0-9]?)? - Group 3: an optional sequence of an ASCII letter and then an optional digit
  • $ - end of string.

Upvotes: 1

Related Questions