Reputation: 414
I use the Postgres regexp_matches function to extract numbers.
The regular expression I use is
4([\s\-\/\.]*?0){3}([\s\-\/\.]*?[12]){1}([\s\-\/\.]*?\d){4}
If I use a tool like https://regexr.com/ to verify if it's working and I apply the following test set
4-0001-1234
5-2342-2344
499999999
4-0001-1234 4.0001.12344 4-0-0-0-1-1234
I get the expected extraction result:
4-0001-1234
4-0001-1234
4.0001.1234
4-0-0-0-1-1234
However, if I use the same expression in Postgresql, it does work well:
SELECT unnest(regexp_matches('4-0001-1234', '4([\s\-\/\.]*?0){3}([\s\-\/\.]*?[12]){1}([\s\-\/\.]*?\d){4}', 'g'));
Result:
0
1
4
What I suspect is, that it has to do with the greediness and/or that the quantifiers like {3} are not applied in the right way. Or it uses the Posix standard for regular expressions, which always seem to differ a little bit from the Java Syntax.
Any suggestions why it is not working and how to fix it?
Upvotes: 0
Views: 2634
Reputation: 626903
The regexp_matches(string text, pattern text [, flags text])
function returns the captured values:
Return all captured substrings resulting from matching a POSIX regular expression against the string.
You may fix the expression using non-capturing groups:
SELECT unnest(regexp_matches('4-0001-1234 4.0001.12344 4-0-0-0-1-1234', '4(?:[\s/.-]*0){3}(?:[\s/.-]*[12])(?:[\s/.-]*\d){4}', 'g'));
See the online demo.
BTW, you do not need to escape -
when it is at the start/end of the bracket expression, and there is no need to escape neither /
nor .
there. I also suggest removing {1}
as a
= a{1}
in any regex supporting limiting quantifiers.
Upvotes: 2