Fanga
Fanga

Reputation: 77

How to process this string by regex in oracle plsql

My string (varchar2): 1111 = 2222;3333 in 4444;5555 sum (6666;7777) I want to replace in my string ";" through " ; " except the ";" in parentheses.

Expected result:

1111 = 2222 ; 3333 in 4444 ; 5555 sum (6666;7777)

How to handle this case with the regex_replace function?

I want my regex to say: target them; and exclude them; in parentheses I know targeting a character is just indicating the character. The syntax for excluding a character is [^character_list] so for me is => [^;]

I tried this: regex_replace (myString, ';([^\(.;.\)]', ';');

But it doesn't work

link for a documentation : https://docs.oracle.com/cd/B12037_01/appdev.101/b10795/adfns_re.htm

Edit: sorry guys I made some mistake in my request. I corrected the description above. I only wish replace ";" trough with " ; " except ";" in parentesis. The regular expression for that in function regex_replace. I already have a function to split after. Sorry for that.

Upvotes: 1

Views: 874

Answers (2)

MT0
MT0

Reputation: 168041

You can use a recursive sub-query factoring clause without any regular expressions and which correctly handles nested braces (such as (abc; (def;fff); zzyzx)):

WITH char_groups (
  id, value, idx, len, grp, grp_start, is_end, brace_level
) AS (
  SELECT id,
         value,
         DECODE( SUBSTR( value, 1, 1 ), ';', 0, 1 ),
         LENGTH( value ),
         1,
         DECODE( SUBSTR( value, 1, 1 ), ';', 0, 1 ),
         CASE
         WHEN ';' IN ( SUBSTR( value, 1, 1 ), SUBSTR( value, 2, 1 ) )
         OR   value IS NULL
         THEN 1
         ELSE 0
         END,
         DECODE( SUBSTR( value, 1, 1 ), '(', 1, 0 )
  FROM   table_name
UNION ALL
  SELECT id,
         value,
         idx + 1,
         len,
         CASE
         WHEN brace_level = 0 AND SUBSTR( value, idx + 1, 1 ) = ';'
         THEN grp + 1
         ELSE grp
         END,
         CASE
         WHEN brace_level = 0 AND SUBSTR( value, idx + 1, 1 ) = ';'
         THEN idx + 1
         ELSE grp_start
         END,
         CASE
         WHEN (
                (
                  ( brace_level = 0 AND SUBSTR( value, idx + 1, 1 ) <> '(' )
                  OR
                  ( brace_level = 1 AND SUBSTR( value, idx + 1, 1 ) = ')' )
                )
                AND
                SUBSTR( value, idx + 2, 1 ) = ';'
              )
         OR   idx + 1 = len
         THEN 1
         ELSE 0
         END,
         GREATEST(
           brace_level + DECODE( SUBSTR( value, idx + 1, 1 ), '(', 1, ')', -1, 0 ),
           0
         )
  FROM   char_groups
  WHERE  idx < len
)
SELECT id,
       grp,
       LTRIM( SUBSTR( value, grp_start, idx - grp_start + 1 ), ';' ) AS term
FROM   char_groups cg
WHERE  is_end = 1
ORDER BY id, grp

Which, for the sample data:

CREATE TABLE table_name ( id, value ) AS
SELECT 1, '1111 = 2222; 3333 in 4444; 5555 sum (6666; 7777)' FROM DUAL UNION ALL
SELECT 2, 'ABCD'                    FROM DUAL UNION ALL
SELECT 3, 'A;;C;D;;F(F;F;F);'       FROM DUAL UNION ALL
SELECT 4, NULL                      FROM DUAL UNION ALL
SELECT 5, 'abc; (e;f) = ( e; f)'    FROM DUAL UNION ALL
SELECT 6, '(abc; (def;fff); zzyzx)' FROM DUAL UNION ALL
SELECT 7, '(((abc)));(def));((ghi;jkl);mno);pqr' FROM DUAL UNION ALL
SELECT 8, ';'                       FROM DUAL UNION ALL
SELECT 9, ';A;'                     FROM DUAL UNION ALL
SELECT 10, '()(;);(;)(;)'           FROM DUAL;

Outputs (including NULL values for when you have two adjacent ; delimiters):

ID | GRP | TERM                   
-: | --: | :----------------------
 1 |   1 | 1111 = 2222            
 1 |   2 |  3333 in 4444          
 1 |   3 |  5555 sum (6666; 7777) 
 2 |   1 | ABCD                   
 3 |   1 | A                      
 3 |   2 | null                   
 3 |   3 | C                      
 3 |   4 | D                      
 3 |   5 | null                   
 3 |   6 | F(F;F;F)               
 3 |   7 | null                   
 4 |   1 | null                   
 5 |   1 | abc                    
 5 |   2 |  (e;f) = ( e; f)       
 6 |   1 | (abc; (def;fff); zzyzx)
 7 |   1 | (((abc)))              
 7 |   2 | (def))                 
 7 |   3 | ((ghi;jkl);mno)        
 7 |   4 | pqr                    
 8 |   1 | null                   
 8 |   2 | null                   
 9 |   1 | null                   
 9 |   2 | A                      
 9 |   3 | null                   
10 |   1 | ()(;)                  
10 |   2 | (;)(;)                 

db<>fiddle here


Previous Version:

Will work if you have single, balanced parentheses:

WITH terms ( id, value, idx, term, spos, epos ) AS (
  SELECT id,
         value,
         1,
         REGEXP_SUBSTR( value, '(([^;(]|\(.*?\))*)(;|$)', 1, 1, NULL, 1 ),
         1,
         REGEXP_INSTR( value, '(([^;(]|\(.*?\))*)(;|$)', 1, 1, 1 )
  FROM   table_name
UNION ALL
  SELECT id,
         value,
         idx + 1,
         REGEXP_SUBSTR( value, '(([^;(]|\(.*?\))*)(;|$)', epos, 1, NULL, 1 ),
         epos,
         REGEXP_INSTR( value, '(([^;(]|\(.*?\))*)(;|$)', epos, 1, 1 )
  FROM   terms
  WHERE  epos > 0
)
SELECT id, idx, term
FROM   terms
WHERE  SUBSTR( value, -1 ) = ';'
OR     spos < LENGTH( value )
OR     value IS NULL
ORDER BY id, idx;

Which outputs:

ID | IDX | TERM                  
-: | --: | :---------------------
 1 |   1 | 1111 = 2222           
 1 |   2 |  3333 in 4444         
 1 |   3 |  5555 sum (6666; 7777)
 2 |   1 | ABCD                  
 3 |   1 | A                     
 3 |   2 | null                  
 3 |   3 | C                     
 3 |   4 | D                     
 3 |   5 | null                  
 3 |   6 | F(F;F;F)              
 3 |   7 | null                  
 4 |   1 | null                  
 5 |   1 | abc                   
 5 |   2 |  (e;f) = ( e; f)      
 6 |   1 | (abc; (def;fff)       
 6 |   2 |  zzyzx)               
 7 |   1 | (((abc)))             
 7 |   2 | (def))                
 7 |   3 | ((ghi;jkl)            
 7 |   4 | mno)                  
 7 |   5 | pqr                   
 8 |   1 | null                  
 8 |   2 | null                  
 9 |   1 | null                  
 9 |   2 | A                     
 9 |   3 | null                  
10 |   1 | ()(;)                 
10 |   2 | (;)(;)                

(Note: it doesn't work for examples 6 & 7 where there are multiple nested parentheses but does work or the simpler test cases.)

db<>fiddle here


Update

I only wish replace ";" trough with " ; " except ";" in parenthesis.

You can use:

WITH char_groups (
  id, value, idx, len, grp, brace_level
) AS (
  SELECT id,
         value,
         CASE SUBSTR( value, 1, 1 ) WHEN ';' THEN 0 ELSE 1 END,
         LENGTH( value ),
         1,
         CASE SUBSTR( value, 1, 1 ) WHEN '(' THEN 1 ELSE 0 END
  FROM   table_name
UNION ALL
  SELECT id,
         CASE
         WHEN brace_level = 0 AND SUBSTR( value, idx + 1, 1 ) = ';'
         THEN SUBSTR( value, 1, idx ) || ' ; ' || SUBSTR( value, idx + 2 )
         ELSE value
         END,
         CASE
         WHEN brace_level = 0 AND SUBSTR( value, idx + 1, 1 ) = ';'
         THEN idx + 3
         ELSE idx + 1
         END,
         CASE
         WHEN brace_level = 0 AND SUBSTR( value, idx + 1, 1 ) = ';'
         THEN len + 2
         ELSE len
         END,
         CASE
         WHEN brace_level = 0 AND SUBSTR( value, idx + 1, 1 ) = ';'
         THEN grp + 1
         ELSE grp
         END,
         GREATEST(
           brace_level
           + CASE SUBSTR( value, idx + 1, 1 )
             WHEN '(' THEN 1
             WHEN ')' THEN -1
             ELSE 0
             END,
           0
         )
  FROM   char_groups
  WHERE  idx < len
)
SELECT id,
       value
FROM   char_groups
WHERE  idx = len OR value IS NULL
ORDER BY id, idx

And, for the sample data:

CREATE TABLE table_name ( id, value ) AS
SELECT 1, '1111 = 2222; 3333 in 4444; 5555 sum (6666; 7777)' FROM DUAL UNION ALL
SELECT 2, 'ABCD'                    FROM DUAL UNION ALL
SELECT 3, 'A;;C;D;;F(F;F;F);'       FROM DUAL UNION ALL
SELECT 4, NULL                      FROM DUAL UNION ALL
SELECT 5, 'abc; (e;f) = ( e; f)'    FROM DUAL UNION ALL
SELECT 6, '(abc; (def;fff); zzyzx)' FROM DUAL UNION ALL
SELECT 7, '(((abc)));(def));((ghi;jkl);mno);pqr' FROM DUAL UNION ALL
SELECT 8, ';'                       FROM DUAL UNION ALL
SELECT 9, ';A;'                     FROM DUAL UNION ALL
SELECT 10, '()(;);(;)(;)'           FROM DUAL UNION ALL
SELECT 11, '(a ; b);(c ; d)'        FROM DUAL;

This outputs:

ID | VALUE                                               
-: | :---------------------------------------------------
 1 | 1111 = 2222 ;  3333 in 4444 ;  5555 sum (6666; 7777)
 2 | ABCD                                                
 3 | A ;  ; C ; D ;  ; F(F;F;F) ;                        
 4 | null                                                
 5 | abc ;  (e;f) = ( e; f)                              
 6 | (abc; (def;fff); zzyzx)                             
 7 | (((abc))) ; (def)) ; ((ghi;jkl);mno) ; pqr          
 8 |  ;                                                  
 9 |  ; A ;                                              
10 | ()(;) ; (;)(;)                                      
11 | (a ; b) ; (c ; d)                                   

Or, for only single nested braces, you can use:

SELECT id,
       RTRIM( REGEXP_REPLACE( value, '(([^;(]|\(.*?\))*)(;|$)', '\1 \3 ' ) )
       || CASE SUBSTR( value, -1 ) WHEN ';' THEN ' ' END AS value
FROM   table_name

Which outputs:

ID | VALUE                                               
-: | :---------------------------------------------------
 1 | 1111 = 2222 ;  3333 in 4444 ;  5555 sum (6666; 7777)
 2 | ABCD                                                
 3 | A ;  ; C ; D ;  ; F(F;F;F) ;                        
 4 | null                                                
 5 | abc ;  (e;f) = ( e; f)                              
 6 | (abc; (def;fff) ;  zzyzx)                           
 7 | (((abc))) ; (def)) ; ((ghi;jkl) ; mno) ; pqr        
 8 |  ;                                                  
 9 |  ; A ;                                              
10 | ()(;) ; (;)(;)                                      
11 | (a ; b) ; (c ; d)                                   

db<>fiddle here

I already have a function to split after.

Please check your split function against case #11.

Upvotes: 1

user5683823
user5683823

Reputation:

Here is one way to do this. The idea is simple: for each token, collect either an opening parenthesis, then any number of non-closing-parenthesis characters, then a closing parenthesis, OR a single non-semicolon. Collect ANY NUMBER of consecutive occurrences of the fragments I just described, followed either by semicolon or by the end of the string. That's a "token".

Note that this will not work correctly if you may have nested parentheses. To illustrate that, I included an example in my sample data. I also included a row where the "value" is NULL from the outset; presumably that should still produce a row in the output, with NULL token.

If my testing is correct, this will produce the same answer as MT0's, except when the input string is NULL (his solution won't produce any output for such inputs).

Sample data

create table sample_data (id, value) as
  select 1, '1111 = 2222; 3333 in 4444; 5555 sum (6666; 7777)'
                                      from dual union all
  select 2, 'ABCD'                    from dual union all
  select 3, null                      from dual union all
  select 4, 'A;;C;D;;F(F;F;F);'       from dual union all
  select 5, 'abc; (e;f) = ( e; f)'    from dual union all
  select 6, '(abc; (def;fff); zzyzx)' from dual
;

Query and output

select i.id, t.ord, t.token
from   sample_data i
       cross apply
       (
         select  level as ord,
                 regexp_substr(i.value, '((\([^)]*\)|[^;])*)(;|$)', 1,
                               level, null, 1) as token
         from    dual
         connect by level <= regexp_count(i.value, '((\([^)]*?\)|[^;])*)(;|$)')
                             - case when i.value like '%;' then 0 else 1 end
       ) t
;

ID ORD TOKEN                 
-- --- ----------------------
 1   1 1111 = 2222           
 1   2  3333 in 4444         
 1   3  5555 sum (6666; 7777)
 2   1 ABCD                  
 3   1                       
 4   1 A                     
 4   2                       
 4   3 C                     
 4   4 D                     
 4   5                       
 4   6 F(F;F;F)              
 4   7                       
 5   1 abc                   
 5   2  (e;f) = ( e; f)      
 6   1 (abc; (def;fff)     -- NOT the required result, when there are     
 6   2  zzyzx)             -- nested parentheses in the input!

Upvotes: 0

Related Questions