ray_voelker
ray_voelker

Reputation: 505

Trying to write an SQL query with regexp_matches() look behind positive in postgresql

From a PostgreSQL database, I'm trying to match 6 or more digits that come after a string that looks like "(OCoLC)" and I thought I had a working regular expression that would fit that description:

(?<=\(ocolc\))[0-9]{6,}

Here are some strings that it should return the digits for:

|a(OCoLC)08507541 will return 08507541

|a(OCoLC)174097142 will return 174097142

etc...

This seems to work to match strings when I test it on regex101.com, but when I incorporate it into my query:

SELECT
regexp_matches(v.field_content, '(?<=\(ocolc\))[0-9]{6,}', 'gi')

FROM
varfield as v

LIMIT 
1;

I get this message:

ERROR: invalid regular expression: quantifier operand invalid

I'm not sure why it doesn't seem to like that expression.

UPDATE I ended up just resorting to using a case statement, as that seemed to be the best way to work around this...

SELECT
CASE
    WHEN v.field_content ~* '\(ocolc\)[0-9]{6,}' 
    THEN (regexp_matches(v.field_content, '[0-9]{6,}', 'gi'))[1]
    ELSE v.field_content
END

FROM
varfield as v

as electricjelly noted, I'm kind of after just the numeric characters, but they have to be preceded by the "(OCoLC)" string, or they're not exactly what I'm after. This is part of a larger query, so I'm running a second case statement a boolean flag in cases where the start of the string wasn't "(OCoLC)". These seems to be more helpful anyway, as I'm going to probably want to preserve those other values somehow.

Upvotes: 1

Views: 1824

Answers (2)

electricjelly
electricjelly

Reputation: 436

After looking over your question it seems your error is caused from a syntax problem, not so much from the function not being available on your version of PostgreSQl, as I tested it on 9.6 and I received the same error.

However, what you seem to want is to pull the numbers from a given field as in

|a(OCoLC)08507541 becomes 08507541

an easy way you could accomplish this would be to use regex_replace

the function would be:

regexp_replace('table.field', '\D', '', 'g')

the \D in the function finds all non-numbers and replaces it with a nothing (hence the '') and returns everything else

Upvotes: 2

ray_voelker
ray_voelker

Reputation: 505

It looks like after doing some more searching, this is only a feature of versions of PostgreSQL server >= 9.6

https://www.postgresql.org/docs/9.6/static/functions-matching.html#POSIX-CONSTRAINTS-TABLE

The version I am running is version 9.4.6

https://www.postgresql.org/message-id/[email protected]

So, the answer is it's not available for this version of PostgreSQL, but presumably this would work just fine in the latest version of the server.

Upvotes: 1

Related Questions