Reputation: 13
I have the following table:
B_ID I_ID R_ID
W00001 1234 1235,1237
B00001 1235 1236,1235
T00001 1236 1235,1235,1235
X00001 1237 1234,1236,1238
M00001 1238 1238
I need output like below using sql
B_ID I_ID R_ID
W00001 1234 B00001|X00001
B00001 1235 T00001|B00001
T00001 1236 B00001
X00001 1237 W00001|T00001|M00001
M00001 1238 M00001
Example: 1st row R_ID has values 1235,1237. 1235 and 1237 is present in I_ID so their corresponding B_ID is picked i.e B00001,X00001 and expected output is B00001|X00001
Upvotes: 1
Views: 3395
Reputation: 167972
Without duplicates and does not rely on any magic numbers:
Oracle Setup:
CREATE TABLE test_data ( b_id, i_id, r_id ) as
select 'W00001', 1234, '1235,1237' from dual union all
select 'B00001', 1235, '1236,1235' from dual union all
select 'T00001', 1236, '1235,1235,1235' from dual union all
select 'X00001', 1237, '1234,1236,1238' from dual union all
select 'M00001', 1238, '1238' from dual;
Query:
SELECT b_id,
i_id,
( SELECT LISTAGG( t.b_id, '|' ) WITHIN GROUP ( ORDER BY ROWNUM )
FROM TABLE( CAST( MULTISET(
SELECT DISTINCT
TO_NUMBER( REGEXP_SUBSTR( d.r_id, '\d+', 1, LEVEL ) )
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT( d.r_id, '\d+' )
) AS SYS.ODCINUMBERLIST ) ) v
INNER JOIN test_data t
ON (v.COLUMN_VALUE = t.i_id) ) AS r_id
FROM test_data d;
Explanation
The inner correlated select:
SELECT DISTINCT
TO_NUMBER( REGEXP_SUBSTR( d.r_id, '\d+', 1, LEVEL ) )
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT( d.r_id, '\d+' )
Takes the r_id
for a single row and and separates it into one row per comma-delimited value; the DISTINCT
clause means only unique values are output.
This is converted into a table collection expression using TABLE( CAST( MULTISET( ... ) AS collection_type ) )
so that it can be joined to another table.
This is then self-joined back to test_data
to convert from displaying i_id
s to b_id
s and LISTAGG()
is used to re-aggregate the multiple rows back to a single row.
Output:
B_ID I_ID R_ID
------ ---------- --------------------
W00001 1234 B00001|X00001
B00001 1235 T00001|B00001
T00001 1236 B00001
X00001 1237 W00001|T00001|M00001
M00001 1238 M00001
Oracle Setup:
CREATE OR REPLACE TYPE numberlist IS TABLE OF NUMBER;
/
CREATE OR REPLACE FUNCTION split_Number_List(
i_str IN VARCHAR2,
i_delim IN VARCHAR2 DEFAULT ','
) RETURN numberlist DETERMINISTIC
AS
p_result numberlist := numberlist();
p_start NUMBER(5) := 1;
p_end NUMBER(5);
c_len CONSTANT NUMBER(5) := LENGTH( i_str );
c_ld CONSTANT NUMBER(5) := LENGTH( i_delim );
BEGIN
IF c_len > 0 THEN
p_end := INSTR( i_str, i_delim, p_start );
WHILE p_end > 0 LOOP
p_result.EXTEND;
p_result( p_result.COUNT ) := TO_NUMBER( SUBSTR( i_str, p_start, p_end - p_start ) );
p_start := p_end + c_ld;
p_end := INSTR( i_str, i_delim, p_start );
END LOOP;
IF p_start <= c_len + 1 THEN
p_result.EXTEND;
p_result( p_result.COUNT ) := TO_NUMBER( SUBSTR( i_str, p_start, c_len - p_start + 1 ) );
END IF;
END IF;
RETURN p_result;
END;
/
Query:
SELECT b_id,
i_id,
( SELECT LISTAGG( t.b_id, '|' ) WITHIN GROUP ( ORDER BY ROWNUM )
FROM TABLE( SET( split_Number_List( d.r_id ) ) ) v
INNER JOIN test_data t
ON (v.COLUMN_VALUE = t.i_id) ) AS r_id
FROM test_data d;
(Same output as above)
Option 3:
SELECT b_id,
i_id,
( SELECT LISTAGG( t.b_id, '|' ) WITHIN GROUP ( ORDER BY ROWNUM )
FROM test_data t
WHERE ',' || d.r_id || ',' LIKE '%,' || t.i_id || ',%' ) AS r_id
FROM test_data d;
You can improve performance of this option by using a function-based on ',' || r_id || ','
and on '%,' || i_id || ',%'
.
(Same output as above)
Upvotes: 1
Reputation:
Decided to add another answer since it uses an entirely different method - the recursive subquery factoring, available since Oracle version 11.2.
I did some testing, with an input (persistent) table called test_data
with 9000 rows, each r_id being a comma-separated string of 200 tokens; structure very similar to the OP's small sample in the original post. I tried three methods: the hierarchical query (using connect by
with the prior sys_guid()
trick), which I proposed; the solution based on a correlated subquery and nested table, posted by MT0; and the recursive query I will show below. In each case, I used the query as the select...
portion of a CTAS statement.
(To compare apples to apples I modified MT0's query, removing the extra information that the "tokens" in r_id are numbers - I treated them as strings, as do the other two methods.)
Recursive query:
with
prep ( b_id, i_id, str, n, st_pos, end_pos, token) as (
select b_id, i_id, ',' || r_id || ',', -1, null, 1, null
from test_data
union all
select b_id, i_id, str, n+1, end_pos + 1, instr(str, ',', 1, n+3),
substr(str, st_pos, end_pos - st_pos)
from prep
where end_pos != 0
),
z ( b_id, i_id, n, token, rn ) as (
select b_id, i_id, n, token,
row_number() over (partition by i_id, token order by n)
from prep
)
select z.b_id, z.i_id,
listagg(t.b_id, '|') within group (order by z.n) as r_id
from z join test_data t
on z.token = t.i_id
where z.rn = 1
group by z.b_id, z.i_id
;
Actually one can squeeze a little bit of extra performance; in the anchor part of the recursive CTE (the first member of the union all
in the definition of prep
), I could start from n = 0
, st_pos = 1
and end_pos =
the position of the first comma (actually the second in comma in the altered string; I find it a lot easier to add commas at the beginning and the end of the input CSV string and write the recursive CTE as I did.) However, this saves just one iteration out of 200 for each string; so a 0.5% of execution time could be saved. I find the way I wrote the recursive CTE easier to follow.
For completeness, here is the modified version of the "nested table" method I used (credit @MT0):
select b_id,
i_id,
( select listagg(t.b_id, '|') within group (order by rownum)
from table (
cast (
multiset (
select distinct regexp_substr(d.r_id, '[^,]+', 1, level)
from dual
connect by level <= regexp_count(d.r_id, ',') + 1
)
as sys.odcivarchar2list
)
) v
inner join test_data t
on (v.column_value = t.i_id)
)
from test_data d;
Upvotes: 0
Reputation:
In the solution below, I use a standard technique to split each comma-separated string into components (tokens) in factored subquery prep
. Then I join back to the original table to replace each token (which is an i_id
) with the corresponding b_id
, and then put the tokens back together into pipe-separated strings with listagg()
.
Note: This solution assumes that each r_id
has fewer than 100 tokens (see the "magic number" 100 in the definition of idx
). If it is known that each r_id
will have no more than 9 tokens, then 100 can be changed to 10 (resulting in faster processing). If NO upper bound is known beforehand, you can change 100 to some ridiculously large number; 4000 will do if r_id
is anything but a CLOB, as VARCHAR2 and such have a limit of 4000 characters.
Thanks to MT0 for reminding me to add this note.
with test_data ( b_id, i_id, r_id ) as (
select 'W00001', 1234, '1235,1237' from dual union all
select 'B00001', 1235, '1236,1235' from dual union all
select 'T00001', 1236, '1235,1235,1235' from dual union all
select 'X00001', 1237, '1234,1236,1238' from dual union all
select 'M00001', 1238, '1238' from dual
),
idx ( n ) as (
select level from dual connect by level < 100
),
prep ( b_id, i_id, n, token ) as (
select t.b_id, t.i_id, i.n,
regexp_substr(t.r_id, '([^,]+)', 1, i.n, null, 1)
from test_data t join idx i
on i.n <= regexp_count(t.r_id, ',') + 1
)
select p.b_id, p.i_id,
listagg(t.b_id, '|') within group (order by p.n) as r_id
from prep p join test_data t
on p.token = t.i_id
group by p.b_id, p.i_id
order by p.i_id;
B_ID I_ID R_ID
------ ---------- ------------------------------
W00001 1234 B00001|X00001
B00001 1235 T00001|B00001
T00001 1236 B00001|B00001|B00001
X00001 1237 W00001|T00001|M00001
M00001 1238 M00001
Added information based on further conversation with MT0.
I edited this "Added information" again based on even more conversation with MT0. Thank you MT0 for keeping me on my toes!
In the solution below I do away with the magic number 100 and instead I use a common technique to deal with multiple input rows and connect by level
from dual. I also show a common technique for dealing with duplicates (in the tokens obtained from the comma-separated input strings).
Query:
with
test_data ( b_id, i_id, r_id ) as (
select 'W00001', 1234, '1235,1237' from dual union all
select 'B00001', 1235, '1236,1235' from dual union all
select 'T00001', 1236, '1235,1235,1235' from dual union all
select 'X00001', 1237, '1234,1236,1238' from dual union all
select 'M00001', 1238, '1238' from dual
),
prep ( b_id, i_id, n, token ) as (
select b_id, i_id, level,
regexp_substr(r_id, '([^,]+)', 1, level, null, 1)
from test_data t
connect by level <= regexp_count(r_id, ',') + 1
and prior r_id = r_id -- to only generate the rows needed
and prior sys_guid() is not null -- this is unique, to avoid cycles
),
z ( b_id, i_id, n, token, rn ) as (
select b_id, i_id, n, token,
row_number() over (partition by i_id, token order by n)
from prep
)
select z.b_id, z.i_id,
listagg(t.b_id, '|') within group (order by z.n) as r_id
from z join test_data t
on z.token = t.i_id
where z.rn = 1
group by z.b_id, z.i_id
order by i_id;
Result:
B_ID I_ID R_ID
------ ---------- ------------------------------
W00001 1234 B00001|X00001
B00001 1235 T00001|B00001
T00001 1236 B00001
X00001 1237 W00001|T00001|M00001
M00001 1238 M00001
5 rows selected.
Upvotes: 0