Reputation: 69
I was trying to make a Regex to extract some information with the SUBSTRING function, but the REGEX that I've tried on https://regex101.com/r/cFy11t/1 return an error on postgresql : "quantifier operand invalid"
The goal of this substring is to extract the last number in the string, and include complement as "A, B... G, or BIS, TER" if the number is like "12B of street X". If several numbers are in the string, it should pick the last one, except if the last one is at the end of the string.
For example, in the string "123 47F ABC 33 BIS", the result should be "47F"
I've already changed a lot the code since I wasn't getting the expected output, but I only get an error since I've forced the greedy quantifier "++" and "?+" to get the letter if possible
So here is the last version of my code where postgres raise an error: (but you can click on the link up to see it with color it might be more clear)
SELECT SUBSTRING(Adresse, '(\d++((?:\s)?([A-G]|BIS|TER|QUARTER)?+\s)(?!$))(?!(.*\d+(\s)?[A-G]?+\D))') Numero,
...
Thank you for your time !
Upvotes: 2
Views: 434
Reputation: 382
I couldn't understand your question clearly but still, I gave it a try. If my answer is wrong, I would appreciate a few examples with the outcome you want.
Regex: '(\d+[A-G]+)'
Link to example: https://regex101.com/r/cFy11t/4
Update:
New Regex: '(?:\d+\w(?=\s\d+[A-z]{2,}))|(\d+[A-z])(?=\s\d+[A-z])'
Link to example: https://regex101.com/r/cFy11t/11
I hope this will fulfil your requirement. If not let me know.
Upvotes: 0
Reputation: 627022
You may use
'^.*\y(\d+(?!\s?(?:BIS|TER|QUARTER|[A-G])$)\s?(?:BIS|TER|QUARTER|[A-G])?\y)'
See the online demo
The main point is to get rid of possessive quantifiers and make sure you apply the end of string check taking into account that the trailing patterns are optional.
Details
^
- start of string.*\y
- any 0+ chars, as many as possible, up to the last word boundary (followed with...)(\d+(?!\s?(?:BIS|TER|QUARTER|[A-G])$)\s?(?:BIS|TER|QUARTER|[A-G])?\y)
- Group 1 (this will be returned by the SUBSTRING
):
\d+
- 1+ digits(?!\s?(?:BIS|TER|QUARTER|[A-G])$)
- not followed with an optional whitespace and then BIS
, or TER
, or QUARTER
or a letter from A
through G
at the end of the string\s?
- an optional whitespace(?:BIS|TER|QUARTER|[A-G])?
- an optional match for BIS
, or TER
, or QUARTER
or a letter from A
through G
\y
- a word boundary.Upvotes: 1