Reputation: 43
Hi there I'm trying to extract from an oracle database part of the text contained within a single field using the REGEXP_SUBSTR function. The text in question is shown in bold text below "BRS14774366". The good news is the pattern of the data i'm trying to extract is fairly consistent in that it always begins with "-" and ends with "CSN", however the text between which I'm trying to consistently extract is not always the same and can consist of both alpha and numeric characters and be between 1-12 characters in length.
PSN932-52506252-BRS14774366CSN/SF-1/25JAN0524
Below are further examples showing the slight differences in length, again the text i'm trying to extract is shown in bold. As you can see the position is always the same but the Alpha numeric characters can be anywhere between "-" and "CSN" with differing lengths.
PSN932-49837056-DELAIR09364CSN/SF-66/25JAN0541
PSN932-51231434-H1001865CSN/SF-5/25JAN0546
PSN932-52648256-2EGA814CSN/SF-10/25JAN0549
Taking the first sample data (PSN932-52506252-BRS14774366CSN/SF-1/25JAN0524) I created the below query that outputs the data correctly, however this query doesn't consider that the text can consist of both Alpha/numeric characters with a varying length between 1-12 characters
SELECT REGEXP_SUBSTR('PSN932-52506252-BRS14774366CSN/SF-1/25JAN0524', '-(\D\D\D\d\d\d\d\d\d\d\d)',1 ,1, 'i',1) "REGEXP_SUBSTR" FROM DUAL;
output of above query is as follows:
BRS14774366
Can anyone tell me how to format the match pattern in the query so I can consistently extract the data between "-" and "CSN"?
As always appreciate any assistance people can provide?
Update - it seems the data is stored containing carriage returns so the below query doesn't work:
SELECT
REGEXP_SUBSTR('PSN
932-52506252-BRS14774366
CSN/SF-1/25JAN0524', '-(\w+)CSN', 1, 1, 'i', 1) "REGEXP_SUBSTR"
FROM DUAL;
works fine if the data was like this:
SELECT
REGEXP_SUBSTR('PSN932-52506252-BRS14774366CSN/SF-1/25JAN0524', '-(\w+)CSN', 1, 1, 'i', 1) "REGEXP_SUBSTR" FROM DUAL;
Can this function deal with carriage returns?
Upvotes: 0
Views: 888
Reputation: 85
Even though the query looks ugly, just posting an alternative to regexp related. But the text needs to be duplicated to get the desired result.
SELECT
substr('shusduhash-basb'
|| CHR(10)
|| 'daks-jsbabsCSN/', instr('shusduhash-basb'
|| CHR(10)
|| 'daks-jsbabsCSN/',
'-',
1,
2) + 1,
instr('shusduhash-basb'
|| CHR(10)
|| 'daks-jsbabsCSN/', 'CSN/', 1, 1) - instr('shusduhash-basb'
|| CHR(10)
|| 'daks-jsbabsCSN/', '-', 1, 2) - 1) x
FROM
dual;
Upvotes: 0
Reputation: 4660
Use the alternation operator to account for the carriage return:
SELECT
regexp_substr('PSN932-52506252-BRS1477'
|| CHR(13)
|| '4366CSN/SF-1/25JAN0524','(([[:alnum:]]|['
|| CHR(13)
|| '])+)CSN') "REGEXP_SUBSTR"
FROM
dual;
Upvotes: 0
Reputation: 81
You can use \w to match any alphanumeric character
\w A word character, which is defined as an alphanumeric or underscore () character. It is equivalent to the POSIX class [[:alnum:]]. Note that if you do not want to include the underscore character, you can use the POSIX class [[:alnum:]].
So the pattern should be changed to -(\w+)CSN
Removing the newlines is probably done easiest by replacing the linefeed/carrige return characters.
WITH s AS (select 'SN932-52506252-BRS14774366CSN/SF-1/25JAN0524' n from dual union all
select 'PSN932-49837056-DELAIR09364' || chr(10) || 'CSN/SF-66/25JAN0541' from dual union all
select 'PSN932-51231434-H1001865CSN/SF-5/25JAN0546' from dual union all
select 'PSN932-52648256-2EGA814' || chr(13) || 'CSN/SF-10/25JAN0549' from dual),
remove_newlines as (select replace(replace(s.n, chr(10), ''), chr(13), '') n from s)
SELECT regexp_substr(s.n, '-(\w+)CSN', 1, 1, 'i', 1) "REGEXP_SUBSTR" FROM remove_newlines s;
Upvotes: 1
Reputation: 22969
You can avoid regular expressions for this, by combining instr
and substr
.
This may be not so easy to read, but usually performs better then regexp solutions.
with test(x) as (
select 'PSN932-49837056-DELAIR09364CSN/SF-66/25JAN0541' from dual union all
select 'PSN932-51231434-H1001865CSN/SF-5/25JAN0546' from dual union all
select 'PSN932-52648256-2EGA814CSN/SF-10/25JAN0549' from dual
)
select substr(
substr(x, 1, instr(x, 'CSN') -1),
instr(
substr(x, 1, instr(x, 'CSN') -1),
'-',
-1
)+1
)
from test
This gets the part up to CSN :
substr(substr(x, 1, instr(x, 'CSN') -1)
and then gets the substring of this part starting from the last '-' :
instr(substr(x, 1, instr(x, 'CSN') -1), '-',-1)+1
Upvotes: 0
Reputation: 2496
Is it what you're looking for?
SQL> with
2 s as (select 'SN932-52506252-BRS14774366CSN/SF-1/25JAN0524' n from dual union all
3 select 'PSN932-49837056-DELAIR09364CSN/SF-66/25JAN0541' from dual union all
4 select 'PSN932-51231434-H1001865CSN/SF-5/25JAN0546' from dual union all
5 select 'PSN932-52648256-2EGA814CSN/SF-10/25JAN0549' from dual)
6 select
7 substr(replace(regexp_substr(s.n, '-([[:alpha:]]|[[:digit:]])+CSN'), 'CSN'), 2)
8 from s;
SUBSTR(REPLACE(REGEXP_SUBSTR(S
--------------------------------------------------------------------------------
BRS14774366
DELAIR09364
H1001865
2EGA814
Upvotes: 1