Casey
Casey

Reputation: 223

Extracting columns and table names from an oracle query that is based on dynamic SQL

I want an approach/code snippet to extract column names and the corresponding table name from an oracle query. The queries and consequently the columns and table names change at run time and some of the column names usually are calculated meaning they are wrapped in a function and aliased. I tried different string tokenizing techniques using regexp to separate this out as per the expeted output, but so far, no luck ! Eg:

select mandate_name, investment_sub_team_name,
fn_sum(REG_INV_CMP_AUM) REG_INV_CMP_AUM, 
fn_sum(NON_REG_INV_CMP_AUM) NON_REG_INV_CMP_AUM
from DM_SAI_VALUATIONS_STEP3
where position_interval_type = 'E'
and position_type = 'T'
group by mandate_name, investment_sub_team_name;

I want the output for the columns as:

mandate_name
investment_sub_team_name
fn_sum(REG_INV_CMP_AUM)
fn_sum(NON_REG_INV_CMP_AUM)

Note above: I want the columns with the function and not the alias

I want the output for the table name as: DM_SAI_VALUATIONS_STEP3 against all the columns that I listed above

I cannot edit the queries as they are part of an xml output. So, i cannot change the alias. The second point is to just extract the table name from the query. Please consider the fact that nothing can be hard coded like position of the string token etc as the queries containing the columns and the table would be different. I am looking for a generic approach to tokenize them. So, against the column output that I expect, i just need the table name as well..Its always going to only one table in the from clause, so extracting that would not be an issue.

Expected output:

Column Name                          Table Name
-----------                          ----------
mandate_name                         DM_SAI_VALUATIONS_STEP3
investment_sub_team_name             DM_SAI_VALUATIONS_STEP3
fn_sum(REG_INV_CMP_AUM)              DM_SAI_VALUATIONS_STEP3
fn_sum(NON_REG_INV_CMP_AUM)          DM_SAI_VALUATIONS_STEP3

Any help pr pointers would be much appreciated.

Upvotes: 1

Views: 3956

Answers (3)

hol
hol

Reputation: 8423

I also agree it is generally not possible. But maybe the solution is to get in touch with the creator of the XML message and agree on a different protocol then a finished up SELECT statement beforehand. Agree with him sending the columns.

If this is not possible and you want to make certain assumptions about how the query is built then you can tokenize after the selectand before from by using the , as a separator. But by all I know you can not really do that by regular expression substring commands. I think you need a bit of PL/SQL function written.

But still take care from keyword could be somewhere part of the columns selecting. What do you do if you suddenly get a query like this:

select 
  something, 
  (select count(*) from othertable) as cnt, 
  andfromthiscolumn xyz 
from mytable

So my tip here is to rather sort it out organizationally then trying to code the impossible.

Upvotes: 1

If you know that the structure of your query strings will not change much, you can do something like this:

set serveroutput on
set termout on
clear
declare
  v_str varchar2(500) := 'select mandate_name, investment_sub_team_name,
fn_sum(REG_INV_CMP_AUM) REG_INV_CMP_AUM, 
fn_sum(NON_REG_INV_CMP_AUM) NON_REG_INV_CMP_AUM
from DM_SAI_VALUATIONS_STEP3
where position_interval_type = ''E''
and position_type = ''T''
group by mandate_name, investment_sub_team_name;';
  v_tmp varchar2(500);
  v_cols varchar2(500);
  v_table varchar2(500);
begin
  v_tmp := replace( v_str, 'select ','');
  v_tmp := substr( v_tmp, 1, instr(v_tmp, 'where')-1);

  dbms_output.put_line('original query: '||v_str);  

  v_cols := substr (v_tmp, 1, instr(v_tmp, 'from')-1);
  dbms_output.put_line('column names: '||v_cols);


  v_table := substr(v_tmp, instr(v_tmp, 'from ')+4, 500);
  dbms_output.put_line('table name: '||v_table);
end;

Upvotes: 0

Justin Cave
Justin Cave

Reputation: 231781

You realistically can't solve this problem in general without writing your own SQL compiler (at least the parser and lexer up through the semantic analysis phase). That is a non-trivial exercise particularly if you want to accept any valid Oracle SQL query. Oracle Corporation used to have different SQL parsers for the SQL VM and the PL/SQL VM and couldn't keep them in sync-- it's a major investment of time and effort to keep evolving your parser as the supported SQL grammar improves.

If you're really determined to go down this path, you can start with some of the ANTLR SQL grammars. The O'Reilly Flex and Bison book also has a chapter on parsing SQL that you could potentially use as a starting point. Of course, you'll need to revise and extend the grammars to support whatever SQL features your queries might contain. You'll then need to build the syntax analyzer and semantic analysis portions of the compiler to implement the appropriate scope resolution rules to be able to figure out which table a particular reference to a particular column comes from. Just to reiterate, this is a non-trivial exercise and it's one that has to be enhanced for every new release of the database.

If you can relax your requirements and make some assumptions about what sorts of queries you're going to be seeing, it becomes much easier to write a parser. If you can guarantee that every query references exactly 1 table, identifying which table a particular column comes from is much easier. If you can guarantee that every function call takes at most one column from one table as a parameter, that also makes things easier-- otherwise, you'll need to figure out what you want to return for the table name if the column is the result of a function that takes as arguments columns from multiple tables.

Upvotes: 3

Related Questions