John
John

Reputation: 55

ORACLE SQL: Builld a REGEXP_SUBSTR in a CASE statement

I have the following columns: CODE_LINES AND TABLE_NAME

CODE_LINES - contains the whole code line of code

At the moment I tried to build a REGEXP_SUBSTR but only retrieves X.T_CLIENTS. How can i modify it in order to get only T_CLIENTS?

    SELECT 
    CASE 
    WHEN INSTR(TRIM(ALLS.CODE_LINES), '.')>1 AND SUBSTR(TRIM(ALLS.CODE_LINES), 1, 6) in ('<FROM>','<INTO>','<RULE>') 
    THEN  REGEXP_SUBSTR(TRIM(ALLS.TEXT),'(\S*)(\s)', 1, 2)
    ELSE SUBSTR(TRIM(ALLS.CODE_LINES), 8, LENGTH(TRIM(ALLS.CODE_LINES))) END AS TABLE_NAME,
    CODE_LINES
    FROM TABLE_A

After running this code i get:

    TABLE_NAME
    X.T_CLIENTS
    Y.T_COSTS
    Z.T_FINANCE
    Q.T_OPERATIONS

The desired output:

    TABLE_NAME
    T_CLIENTS
    T_COSTS
    T_FINANCE
    T_OPERATIONS

Upvotes: 0

Views: 1552

Answers (1)

Alex Poole
Alex Poole

Reputation: 191245

More data would be useful, but if I understand correctly you're looking for something like:

regexp_replace(code_lines, '^(<[^>]+>\s*)?([^ .]+\.)?([^ ]+)(.*)$', '\3')

That creates four groups, via the parentheses. The first group is optional and gives you the <FROM> etc. followed by any number of spaces. The second group is optional and gives you anything that isn't a space or a period, but which is followed by a period. The third is any run of characters that aren't spaces. The fourth is whatever is left. You want the third group, which represents the table name in your example.

From a previous question you may also want the schema, which is the second group. With some additional made-up data:

select code_lines,
  rtrim(regexp_replace(code_lines, '^(<[^>]+>\s*)?([^ .]+\.)?([^ ]+)(.*)$', '\2'),
    '.') as owner,
  regexp_replace(code_lines, '^(<[^>]+>\s*)?([^ .]+\.)?([^ ]+)(.*)$', '\3')
    as table_name
from table_a;

CODE_LINES                               OWNER                          TABLE_NAME                    
---------------------------------------- ------------------------------ ------------------------------
<FROM> X.T_CLIENTS WHERE TARGET_DATA ... X                              T_CLIENTS                     
<INTO> Y.T_COSTS WHERE CLIENT= ...       Y                              T_COSTS                       
<FROM> Z.T_FINANCE WHERE CUSTOMER ...    Z                              T_FINANCE                     
<INTO> Q.T_OPERATIONS WHERE NAME= ...    Q                              T_OPERATIONS                  
<RULE> Q.T_OPERATIONS WHERE NAME= ...    Q                              T_OPERATIONS                  
<FROM> T_FINANCE WHERE CUSTOMER ...                                     T_FINANCE                     
<OTHER> T_FINANCE WHERE CUSTOMER ...                                    T_FINANCE                     
<OTHER> Z.T_FINANCE WHERE CUSTOMER ...   Z                              T_FINANCE                     
T_FINANCE WHERE CUSTOMER ...                                            T_FINANCE                     
Z.T_FINANCE WHERE CUSTOMER ...           Z                              T_FINANCE                     

Or allowing for other whitespace:

select code_lines,
  rtrim(regexp_replace(code_lines,
    '^(<[^>]+>[[:space:]]*)?([^[:space:].]+\.)?([[:alnum:]_]+)(.*)$', '\2'), '.')
    as owner,
  regexp_replace(code_lines,
    '^(<[^>]+>[[:space:]]*)?([^[:space:].]+\.)?([[:alnum:]_]+)(.*)$', '\3')
    as table_name
from table_a;

which gets the same output.

Of course, trying to parse this kind of thing is likely to throw up edge cases. It won't be happy with some quoted identifiers, for instance, but hopefully you don't have any of those. It will hopefully give you a starting point anyway.

Upvotes: 1

Related Questions