Luan
Luan

Reputation: 79

Specific patterns in Postgresql

I'm getting familiar with postgres sql, but having some trouble with pattern matching. I read the documentation and looked through other questions, but couldn't solve this on my own.

I have a field with lots of text data, in the middle of it, numbers with this pattern:

"2021-1234567" (four digits + - + seven digits)

Problem is, it can have other number sequences. Like this:

"Project number 12345678912345 with id 2020-2583697 1456" (in this case, i need to extract 2020-2583697)

In some cases it may be just eleven digits, like this:

"Project 12345678912345 sequence 20202583697 1456" (in this case i need to extract 20202583697)

At first i tried to extract only the numbers (the text is mostly user input) with:

SELECT
   SUBSTRING("my_field", '^[0-9]+$' )
FROM 
    my_table

That didn't help at all... Can anyone help me?

Upvotes: 1

Views: 105

Answers (2)

marcothesane
marcothesane

Reputation: 6721

Or this, as I could not manage to force checking for blanks around the pattern without returning those blanks otherwise:

WITH 
indata(s) AS (
            SELECT 'Project number 12345678912345 with id 2020-2583697 1456'
  UNION ALL SELECT 'Project 12345678912345 sequence 20202583697 1456'
)
SELECT
  REGEXP_REPLACE(s,'^.* (\d{4}-?\d{7}) .*$','\1') AS found_token
, s
FROM indata;
 found_token  |                            s                            
--------------+---------------------------------------------------------
 2020-2583697 | Project number 12345678912345 with id 2020-2583697 1456
 20202583697  | Project 12345678912345 sequence 20202583697 1456
(2 rows)

The pattern used - REGEXP_REPLACE(s,'^.* (\d{4}-?\d{7}) .*$','\1') - means: replace ^.* the beginning of the string and any number of any characters, followed by a blank; then (\d{4}-?\d{7}) four digits, zero or one dash - -?, and seven digits - and the parentheses around it mean: remember this as the first group; finally: .*$ a blank, then any number of any characters till the end of the string - with group 1: \1 .

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269693

This appears to do what you want:

select substring(str, '[0-9]{4}-?[0-9]{7}')
from (values ('asfasdf 2020-2583697 qererf i0iu0 1234234'),
             ('asfasdf 20202583697 qererf i0iu0 1234234')
             ) v(str)

It searches for 4 digits followed by an optional hyphen followed by 7 digits.

Upvotes: 1

Related Questions