Sebastian
Sebastian

Reputation: 414

Postgresql regexp_matches syntax not working as expected

I use the Postgres regexp_matches function to extract numbers.

The regular expression I use is

4([\s\-\/\.]*?0){3}([\s\-\/\.]*?[12]){1}([\s\-\/\.]*?\d){4}

If I use a tool like https://regexr.com/ to verify if it's working and I apply the following test set

4-0001-1234
5-2342-2344
499999999
4-0001-1234 4.0001.12344  4-0-0-0-1-1234

I get the expected extraction result:

4-0001-1234
4-0001-1234
4.0001.1234
4-0-0-0-1-1234

However, if I use the same expression in Postgresql, it does work well:

SELECT unnest(regexp_matches('4-0001-1234', '4([\s\-\/\.]*?0){3}([\s\-\/\.]*?[12]){1}([\s\-\/\.]*?\d){4}', 'g'));

Result:

0
1
4

What I suspect is, that it has to do with the greediness and/or that the quantifiers like {3} are not applied in the right way. Or it uses the Posix standard for regular expressions, which always seem to differ a little bit from the Java Syntax.

Any suggestions why it is not working and how to fix it?

Upvotes: 0

Views: 2634

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626903

The regexp_matches(string text, pattern text [, flags text]) function returns the captured values:

Return all captured substrings resulting from matching a POSIX regular expression against the string.

You may fix the expression using non-capturing groups:

SELECT unnest(regexp_matches('4-0001-1234 4.0001.12344  4-0-0-0-1-1234', '4(?:[\s/.-]*0){3}(?:[\s/.-]*[12])(?:[\s/.-]*\d){4}', 'g'));

See the online demo.

enter image description here

BTW, you do not need to escape - when it is at the start/end of the bracket expression, and there is no need to escape neither / nor . there. I also suggest removing {1} as a = a{1} in any regex supporting limiting quantifiers.

Upvotes: 2

Related Questions