Reputation: 79
I'm getting familiar with postgres sql, but having some trouble with pattern matching. I read the documentation and looked through other questions, but couldn't solve this on my own.
I have a field with lots of text data, in the middle of it, numbers with this pattern:
"2021-1234567" (four digits + - + seven digits)
Problem is, it can have other number sequences. Like this:
"Project number 12345678912345 with id 2020-2583697 1456" (in this case, i need to extract 2020-2583697)
In some cases it may be just eleven digits, like this:
"Project 12345678912345 sequence 20202583697 1456" (in this case i need to extract 20202583697)
At first i tried to extract only the numbers (the text is mostly user input) with:
SELECT
SUBSTRING("my_field", '^[0-9]+$' )
FROM
my_table
That didn't help at all... Can anyone help me?
Upvotes: 1
Views: 105
Reputation: 6721
Or this, as I could not manage to force checking for blanks around the pattern without returning those blanks otherwise:
WITH
indata(s) AS (
SELECT 'Project number 12345678912345 with id 2020-2583697 1456'
UNION ALL SELECT 'Project 12345678912345 sequence 20202583697 1456'
)
SELECT
REGEXP_REPLACE(s,'^.* (\d{4}-?\d{7}) .*$','\1') AS found_token
, s
FROM indata;
found_token | s
--------------+---------------------------------------------------------
2020-2583697 | Project number 12345678912345 with id 2020-2583697 1456
20202583697 | Project 12345678912345 sequence 20202583697 1456
(2 rows)
The pattern used - REGEXP_REPLACE(s,'^.* (\d{4}-?\d{7}) .*$','\1')
- means: replace ^.*
the beginning of the string and any number of any characters, followed by a blank; then (\d{4}-?\d{7})
four digits, zero or one dash - -?
, and seven digits - and the parentheses around it mean: remember this as the first group; finally: .*$
a blank, then any number of any characters till the end of the string - with group 1: \1
.
Upvotes: 1
Reputation: 1269693
This appears to do what you want:
select substring(str, '[0-9]{4}-?[0-9]{7}')
from (values ('asfasdf 2020-2583697 qererf i0iu0 1234234'),
('asfasdf 20202583697 qererf i0iu0 1234234')
) v(str)
It searches for 4 digits followed by an optional hyphen followed by 7 digits.
Upvotes: 1