Reputation: 210852
I need a generic Oracle function which takes a CSV string as a first parameter and a regular expression string, which defines a CSV separator as a second parameter and returns a table of parsed strings like as follows:
INPUT data:
NAME PROJECT ERROR
108 test string-1, string-2 ; string-3
109 test2 single string
110 test3 ab, ,c
OUTPUT data:
NAME PROJECT ERROR
108 test string-1
108 test string-2
108 test string-3
109 test2 single string
110 test3 ab
110 test3 NULL
110 test3 c
the separators might be different in different source tables, so I'd like to be able to specify them dynamically as a regex.
How can I create a generic function out of the following code:
with temp as
(
select 108 Name, 'test' Project, 'string-1 , string-2 ; string-3' Error from dual
union all
select 109, 'test2', 'single string' from dual
)
select distinct
t.name, t.project,
trim(regexp_substr(t.error, '[^,;]+', 1, levels.column_value)) as error
from
temp t,
table(cast(multiset(select level from dual connect by level <= length (regexp_replace(t.error, '[^,;]+')) + 1) as sys.OdciNumberList)) levels
order by name;
So I'm thinking of a function that takes the following parameters and returns a table of strings
CREATE OR REPLACE FUNCTION csvstr2tab(
p_str IN VARCHAR2,
p_sep_re IN VARCHAR2 DEFAULT '\s*[,;]\s*'
)
PS I have used this answer
UPDATE: please note that I'm using the abbreviation "CSV" here just in order to explain that the input string has multiple values, separated by different separators. I'm dealing with a free text, written by human beings, that used different separators. So the input string doesn't have to be a correct CSV in my case - it's just a string separated by multiple different separators.
Upvotes: 3
Views: 399
Reputation: 6084
If you have APEX installed in your database, there is a function named APEX_STRING.SPLIT that does exactly what you are looking for. You can pass a single character or a regex expression that can be used to split the string. There is also an overloaded version of the function so that the same call can be used to split a VARCHAR2
or a CLOB
.
WITH
test_data (NAME, PROJECT, ERROR)
AS
(SELECT 108, 'test', 'string-1, string-2 ; string-3' FROM DUAL
UNION ALL
SELECT 109, 'test2', 'single string' FROM DUAL
UNION ALL
SELECT 110, 'test3', 'ab, ,c' FROM DUAL)
SELECT name,
project,
error,
TRIM (s.COLUMN_VALUE) as split_value
FROM test_data td, TABLE (apex_string.split (error, '[,;]')) s;
NAME PROJECT ERROR SPLIT_VALUE
_______ __________ ________________________________ ________________
108 test string-1, string-2 ; string-3 string-1
108 test string-1, string-2 ; string-3 string-2
108 test string-1, string-2 ; string-3 string-3
109 test2 single string single string
110 test3 ab, ,c ab
110 test3 ab, ,c
110 test3 ab, ,c c
Upvotes: 1
Reputation: 3697
Try the reproducible example below. Preparation of the scheme:
create table prjerr as
select 108 Name, 'test' Project, 'string-1 , string-2 ; string-3' Error from dual
union all
select 109, 'test2', 'single string' from dual
/
create or replace type tokenList is table of varchar2 (32767)
/
Function implementation:
create or replace function csvstr2tab (
str varchar2, delimiter char := '\s*[,;]\s*') return tokenList is
pattern constant varchar2 (64) := '(.*?)(('||delimiter||')|($))';
tokens tokenList := tokenList ();
s varchar2 (96);
c int := 0;
begin
<<split>> loop c := c + 1;
s := regexp_substr (str, pattern, 1, c, null, 1);
exit split when s is null;
tokens.extend;
tokens(tokens.last) := s;
end loop;
return tokens;
end csvstr2tab;
/
Usage of the function and the result:
select distinct name, project, t.column_value error
from prjerr p, csvstr2tab (p.error) t
order by name
/
NAME PROJE ERROR
---------- ----- ----------------
108 test string-1
108 test string-2
108 test string-3
109 test2 single string
PS It was tested on version 12.2.0.1.0
Upvotes: 2
Reputation: 167991
You can use REGEXP_INSTR
to track the start and end of the regular expression matches so that, at each iteration, the regular expression does not need to restart matching from the beginning of the string.
CREATE FUNCTION regexp_split(
value IN VARCHAR2,
regexp_separator IN VARCHAR2 DEFAULT ','
) RETURN string_list PIPELINED DETERMINISTIC
AS
position PLS_INTEGER := 1;
next_position PLS_INTEGER;
BEGIN
IF value IS NULL THEN
RETURN;
END IF;
LOOP
next_position := REGEXP_INSTR( value, regexp_separator, position, 1, 0 );
IF next_position = 0 THEN
PIPE ROW ( SUBSTR( value, position ) );
EXIT;
ELSE
PIPE ROW ( SUBSTR( value, position, next_position - position ) );
position := REGEXP_INSTR( value, regexp_separator, next_position, 1, 1 );
END IF;
END LOOP;
RETURN;
END;
/
(Note: you can also make the function DETERMINISTIC
.)
Then, for the test data:
CREATE TABLE table_name ( NAME, PROJECT, ERROR ) AS
SELECT 108, 'test1', 'string-1, string-2 ; string-3' FROM DUAL UNION ALL
SELECT 109, 'test2', 'single string' FROM DUAL UNION ALL
SELECT 110, 'test3', 'ab, ,c' FROM DUAL UNION ALL
SELECT 111, 'test4', '1,2,;5,,,9' FROM DUAL;
You can use the function with CROSS APPLY
(or a LATERAL
join) to split the string:
SELECT t.name,
t.project,
s.COLUMN_VALUE AS error
FROM table_name t
CROSS APPLY TABLE( regexp_split( error, '\s*[,;]\s*' ) ) s
Which outputs:
NAME | PROJECT | ERROR ---: | :------ | :------------ 108 | test1 | string-1 108 | test1 | string-2 108 | test1 | string-3 109 | test2 | single string 110 | test3 | ab 110 | test3 | null 110 | test3 | c 111 | test4 | 1 111 | test4 | 2 111 | test4 | null 111 | test4 | 5 111 | test4 | null 111 | test4 | null 111 | test4 | 9
db<>fiddle here
Upvotes: 1
Reputation:
Perhaps something like this. I wrote it as a PL/SQL function, as you requested, but note that if the input data resides in the database, this can be done directly in SQL.
For illustration, I called the function with the default separator.
If you are not familiar with pipelined table function, you can read about them in the documentation.
Note also that in Oracle 12.2, but not in 12.1, you can leave out the table( )
operator - you can select directly "from the function".
create type str_t as table of varchar2(4000);
/
create or replace function csvstr2tab(
p_str in varchar2,
p_sep_re in varchar2 default '\s*[,;]\s*'
)
return str_t
pipelined
as
begin
for i in 1 .. regexp_count(p_str, p_sep_re) + 1 loop
pipe row (regexp_substr(p_str, '(.*?)(' || p_sep_re || '|$)', 1, i, null, 1));
end loop;
return;
end;
/
select *
from table(csvstr2tab('blue ;green,,brown;,yellow;'))
;
COLUMN_VALUE
--------------------
blue
green
[NULL]
brown
[NULL]
yellow
[NULL]
One more test (note that the first row in the output has two trailing spaces, too):
select *
from table(csvstr2tab('blue ;green,,brown;,yellow;', ';'))
;
COLUMN_VALUE
-----------------
blue
green,,brown
,yellow
EDIT
Here is how the function can be used to break input strings into tokens when the inputs are in a table (rows identified by an ID, for example), and keep track of token order as well.
with
sample_data(id, str) as (
select 1201, 'blue ;green,,brown;,yellow;' from dual union all
select 1202, 'tinker, tailor, soldier, ...' from dual
)
select sd.id, sd.str, tf.ord, tf.token
from sample_data sd,
lateral ( select rownum as ord, column_value as token
from table(csvstr2tab(sd.str))
) tf
order by id, ord
;
ID STR ORD TOKEN
------ ---------------------------- ------ --------
1201 blue ;green,,brown;,yellow; 1 blue
1201 blue ;green,,brown;,yellow; 2 green
1201 blue ;green,,brown;,yellow; 3
1201 blue ;green,,brown;,yellow; 4 brown
1201 blue ;green,,brown;,yellow; 5
1201 blue ;green,,brown;,yellow; 6 yellow
1201 blue ;green,,brown;,yellow; 7
1202 tinker, tailor, soldier, ... 1 tinker
1202 tinker, tailor, soldier, ... 2 tailor
1202 tinker, tailor, soldier, ... 3 soldier
1202 tinker, tailor, soldier, ... 4 ...
Upvotes: 3