Reputation: 55
I have the following columns: CODE_LINES AND TABLE_NAME
CODE_LINES - contains the whole code line of code
do you guys can help with transforming my SUBSTR in a REGEXP_SUBSTR? Thank you
SELECT
CASE
WHEN INSTR(TRIM(ALLS.CODE_LINES), '.')>1 AND SUBSTR(TRIM(ALLS.CODE_LINES), 1, 6) in ('<FROM>','<INTO>','<RULE>')
THEN SUBSTR(TRIM(ALLS.CODE_LINES), INSTR(TRIM(ALLS.CODE_LINES), '.')+1, LENGTH(TRIM(ALLS.CODE_LINES)))
ELSE SUBSTR(TRIM(ALLS.CODE_LINES), 8, LENGTH(TRIM(ALLS.CODE_LINES))) END AS TABLE_NAME,
CODE_LINES
FROM TABLE_A
CODE_LINES TABLE_NAME
<FROM> X.T_CLIENTS WHERE TARGET_DATA ... T_CLIENTS WHERE TARGET_DATA ...
<INTO> Y.T_COSTS WHERE CLIENT= ... T_COSTS WHERE CLIENT= ...
<FROM> Z.T_FINANCE WHERE CUSTOMER ... T_FINANCE WHERE CUSTOMER ...
<INTO> Q.T_OPERATIONS WHERE NAME= ... T_OPERATIONS WHERE NAME= ...
At the moment I tried to build a REGEXP_SUBSTR but only retrieves X.T_CLIENTS. How can i modify it in order to get only T_CLIENTS?
SELECT
CASE
WHEN INSTR(TRIM(ALLS.CODE_LINES), '.')>1 AND SUBSTR(TRIM(ALLS.CODE_LINES), 1, 6) in ('<FROM>','<INTO>','<RULE>')
THEN REGEXP_SUBSTR(TRIM(ALLS.TEXT),'(\S*)(\s)', 1, 2)
ELSE SUBSTR(TRIM(ALLS.CODE_LINES), 8, LENGTH(TRIM(ALLS.CODE_LINES))) END AS TABLE_NAME,
CODE_LINES
FROM TABLE_A
After running this code i get:
TABLE_NAME
X.T_CLIENTS
Y.T_COSTS
Z.T_FINANCE
Q.T_OPERATIONS
The desired output:
TABLE_NAME
T_CLIENTS
T_COSTS
T_FINANCE
T_OPERATIONS
Upvotes: 0
Views: 1552
Reputation: 191245
More data would be useful, but if I understand correctly you're looking for something like:
regexp_replace(code_lines, '^(<[^>]+>\s*)?([^ .]+\.)?([^ ]+)(.*)$', '\3')
That creates four groups, via the parentheses. The first group is optional and gives you the <FROM>
etc. followed by any number of spaces. The second group is optional and gives you anything that isn't a space or a period, but which is followed by a period. The third is any run of characters that aren't spaces. The fourth is whatever is left. You want the third group, which represents the table name in your example.
From a previous question you may also want the schema, which is the second group. With some additional made-up data:
select code_lines,
rtrim(regexp_replace(code_lines, '^(<[^>]+>\s*)?([^ .]+\.)?([^ ]+)(.*)$', '\2'),
'.') as owner,
regexp_replace(code_lines, '^(<[^>]+>\s*)?([^ .]+\.)?([^ ]+)(.*)$', '\3')
as table_name
from table_a;
CODE_LINES OWNER TABLE_NAME
---------------------------------------- ------------------------------ ------------------------------
<FROM> X.T_CLIENTS WHERE TARGET_DATA ... X T_CLIENTS
<INTO> Y.T_COSTS WHERE CLIENT= ... Y T_COSTS
<FROM> Z.T_FINANCE WHERE CUSTOMER ... Z T_FINANCE
<INTO> Q.T_OPERATIONS WHERE NAME= ... Q T_OPERATIONS
<RULE> Q.T_OPERATIONS WHERE NAME= ... Q T_OPERATIONS
<FROM> T_FINANCE WHERE CUSTOMER ... T_FINANCE
<OTHER> T_FINANCE WHERE CUSTOMER ... T_FINANCE
<OTHER> Z.T_FINANCE WHERE CUSTOMER ... Z T_FINANCE
T_FINANCE WHERE CUSTOMER ... T_FINANCE
Z.T_FINANCE WHERE CUSTOMER ... Z T_FINANCE
Or allowing for other whitespace:
select code_lines,
rtrim(regexp_replace(code_lines,
'^(<[^>]+>[[:space:]]*)?([^[:space:].]+\.)?([[:alnum:]_]+)(.*)$', '\2'), '.')
as owner,
regexp_replace(code_lines,
'^(<[^>]+>[[:space:]]*)?([^[:space:].]+\.)?([[:alnum:]_]+)(.*)$', '\3')
as table_name
from table_a;
which gets the same output.
Of course, trying to parse this kind of thing is likely to throw up edge cases. It won't be happy with some quoted identifiers, for instance, but hopefully you don't have any of those. It will hopefully give you a starting point anyway.
Upvotes: 1