Reputation: 693
I have CLOB data like this:
123456 (LED TV); 234543 (LED light); 654876 (LED monitor);
Now I need to extract all 6-digit numbers (invoice tracking number in my case) from CLOB using delimiter ( in my case ";" ) and select that for Insert into target table but only If record doesn't exist.
I've seen couple of examples with Instr & Substr or Regexp, but nothing what I need or beyond my understanding of Oracle. Can somebody show me an example of how to split CLOB into rows based on string inside CLOB, to make use of It for later Insert ?
P.S.: I would prefer fastest solution since my CLOB data could contain over 5 million of invoice records. It's going to be a stored procedure fired from C# at the end, but this part is causing me headaches... For any help - thanks in advance !
Upvotes: 0
Views: 1270
Reputation: 35900
I have tried to complete the task using DBMS_LOB
package to convert them to strings divided by ";" and then did some string operations on it to achieve the result.
Try following:
INSERT INTO INVOICE_CATEGORIZED
SELECT TAB.INVOICE_NUMBER, TAB.INVOICE_NAME FROM
(SELECT
TRIM(dbms_lob.SUBSTR(INVOICE_INN,6 ,1)) AS INVOICE_NUMBER,
SUBSTR(INVOICE_INN,
INSTR(INVOICE_INN, '(') + 1,
INSTR(INVOICE_INN, ')') - INSTR(INVOICE_INN, '(') - 1 )
AS INVOICE_NAME
-- HERE INVOICE_INN IS STRING NOW, SO WE CAN DO STRING OPERATIONS ON IT ONWARD
FROM
(
-- DIVIDING ; SEPARATED CLOB TO INDIVIDUAL STRING
SELECT
TRIM(CASE WHEN INVOICE_SINGLE.COLUMN_VALUE = 1 THEN
dbms_lob.SUBSTR(INVOICE,
dbms_lob.INSTR(INVOICE,';',1,INVOICE_SINGLE.COLUMN_VALUE) - 1,
1
)
ELSE
dbms_lob.SUBSTR(INVOICE,
dbms_lob.INSTR(INVOICE,';',1,INVOICE_SINGLE.COLUMN_VALUE) - 1
- dbms_lob.INSTR(INVOICE,';',1,INVOICE_SINGLE.COLUMN_VALUE - 1),
dbms_lob.INSTR(INVOICE,';',1,INVOICE_SINGLE.COLUMN_VALUE - 1) + 1)
END) AS INVOICE_INN
FROM
INVOICES T,
TABLE ( CAST(MULTISET(
SELECT
LEVEL
FROM
DUAL
CONNECT BY
dbms_lob.INSTR(INVOICE,';',1,LEVEL) <> 0
) AS SYS.ODCINUMBERLIST) ) INVOICE_SINGLE)) TAB
WHERE NOT EXISTS (SELECT 1 FROM INVOICE_CATEGORIZED IC
WHERE IC.INVOICE_NUMBER = TAB.INVOICE_NUMBER
AND IC.INVOICE_NAME = TAB.INVOICE_NAME)
Cheers!!
Upvotes: 1
Reputation: 142720
Here's an example.
Test case first; test
table contains source data:
SQL> create table test (col clob);
Table created.
SQL> insert into test
2 select '123456 (LED TV); 234543 (LED light); 654876 (LED monitor);' from dual union all
3 select '665988 (Notebook); 987654 (Mouse); 445577 (Dead Pixel);' from dual;
2 rows created.
SQL>
Target
table will contain values extracted from the source:
SQL> create table target (itn number, name varchar2(20));
Table created.
SQL> -- This value shouldn't be inserted as it already exists in the TARGET table:
SQL> insert into target values (234543, 'LED light');
1 row created.
SQL>
Now, something useful. The idea is to split column values to rows (that's what regexp_substr
part in hierarchical query does, and then separate the ID value from the name (which is enclosed into brackets). Values that exist in the target table shouldn't be inserted (so query is supposed to insert 5 rows):
SQL> insert into target (itn, name)
2 with
3 c2r as
4 -- split column to rows, e.g. "123456 (LED TV)" is an example of such a row
5 (select to_char(trim(regexp_substr(col, '[^;]+', 1, column_value))) val
6 from test join table(cast(multiset(select level from dual
7 connect by level <= regexp_count(col, ';')
8 ) as sys.odcinumberlist)) on 1 = 1
9 ),
10 sep as
11 -- separate ITN (invoice tracking nubmer) and NAME
12 (select substr(val, 1, instr(val, ' ') - 1) itn,
13 substr(val, instr(val, ' ') + 1) name
14 from c2r
15 )
16 select s.itn, replace(replace(s.name, '(', ''), ')', '')
17 from sep s
18 -- don't insert values that already exist in the TARGET table
19 where not exists (select null from target t
20 where t.itn = s.itn
21 );
5 rows created.
SQL>
The final result:
SQL> select * From target;
ITN NAME
---------- --------------------
234543 LED light
123456 LED TV
654876 LED monitor
665988 Notebook
987654 Mouse
445577 Dead Pixel
6 rows selected.
SQL>
Upvotes: 0