Replacing Text which does not match a pattern in Oracle

Question

I have below text in a CLOB in a table

Table Name: tbl1
Columns
col1 - number (Primary Key)
col2 - clob (as below)

Row#1
-----
Col1 = 1
Col2 =
1331882981,ab123456,Some text here
which can run multiple lines and have a lot of text...
~1331890329,pqr123223,Some more text...

Row#2
-----
Col1 = 2
Col2 =
1331882981,abc333,Some text here
which can run multiple lines and have a lot of text...
~1331890329,pqrs23,Some more text...

Now I need to know how we can get below output

Col1 Value
---- ---------------------
1 1331882981,ab123456
1 1331890329,pqr123223
2 1331882981,abc333
2 1331890329,pqrs23

([0-9]{10},[a-z 0-9]+.), ==> This is the regular expression to match "1331890329,pqrs23" and I need to know how can replace which are not matching this regex and then split them into multiple rows

EDIT#1
I am on Oracle 10.2.0.5.0 and hence cannot use REGEXP_COUNT function :-( Also, the col2 is a CLOB which is massive

EDIT#2
I've tried below query and it works fine for some records (i.e. if I add a "where" clause). But when I remove the "where", it never returns any result. I've tried to put this into a view and insert into a table and left it run overnight but still it had not completed :(

with t as (select col1, col2 from temp_table)
select col1,
       cast(substr(regexp_substr(col2, '[^~]+', 1, level), 1, 50) as
            varchar2(50)) data
  from t
connect by level <= length(col2) - length(replace(col2, '~')) + 1

EDIT#3

# of Chars in Clob      Total
-----------             -----
0 - 1k                  3196
1k - 5k                 2865
5k - 25k                 661
25k - 100k                36
> 100k                     2
-----------             -----
Grand Total             6760

I have ~7k rows of clobs which have the distribution as shown above...

S.Krishna · Accepted Answer

The above solutions didn't work and below is what I did.

update temp_table set col2=regexp_replace(col2,'([0-9]{10},[a-z0-9]+)','(\1)') ;
update temp_table set col2=regexp_replace(col2,'\),[\s\S]*~$','(\1)$');
update temp_table set col2=regexp_replace(col2,'$.*?$','$');
update temp_table set col2=replace(regexp_replace(col2,'$.*',''),'(','');

After these 4 update commands, the col2 will have something like

1 1331882981,ab123456$1331890329,pqr123223
2 1331882981,abc333$1331890329,pqrs23

Then I wrote a function to split this thing. The reason I went for the function is to split by "$" and the fact that the col2 still has >10k characters

create or replace function parse( p_clob in clob ) return sys.odciVarchar2List
pipelined
as
        l_offset number := 1;
        l_clob   clob := translate( p_clob, chr(13)|| chr(10) || chr(9), '   ' ) || '$';
        l_hit    number;
begin
        loop
          --Find occurance of "$" from l_offset
          l_hit := instr( l_clob, '$', l_offset );
          exit when nvl(l_hit,0) = 0;
          --Extract string from l_offset to l_hit
          pipe row ( substr(l_clob, l_offset , (l_hit - l_offset)) );
          --Move offset
          l_offset := l_hit+1;
        end loop;
end;

I then called

select col1,
       REGEXP_SUBSTR(column_value, '[^,]+', 1, 1) col3,
       REGEXP_SUBSTR(column_value, '[^,]+', 1, 2) col4
  from temp_table, table(parse(temp_table.col2));

Replacing Text which does not match a pattern in Oracle

Answers (2)

Related Questions