Error_2646
Error_2646

Reputation: 3781

Parametrize columns in SELECT statement

I'm looking to write a query, to be used in a macro that compares two tables with the same column names and outputs the percent match of each field. I'd like the macro to take the table names as input.

ex for two static tables.

SELECT 
 SUM(CASE WHEN table1.field1 = table2.field1 THEN 1 ELSE 0 END)/SUM(1)
,SUM(CASE WHEN table1.field2 = table2.field2 THEN 1 ELSE 0 END)/SUM(1)
,SUM(CASE WHEN table1.field3 = table2.field3 THEN 1 ELSE 0 END)/SUM(1)
....
....
,SUM(CASE WHEN table1.fieldN = table2.fieldN THEN 1 ELSE 0 END)/SUM(1)
FROM table1
INNER JOIN table2
    ON table1.keyField = table2.keyField

Is it possible to write a macro which generalizes this?

For example, the psuedo query might look something like this:

CREATE MACRO compareTables (table1 varChar(50),table2 varChar(50),keyField AS varchar(50)) AS (
    WITH sharedColumns (columnName) AS (
        SELECT columnname
        FROM dbc.columns
        WHERE
            tableName = :table1
        INTERSECT
        SELECT columnname
        FROM dbc.columns
        WHERE
            tableName = :table2)
  SELECT 
   SUM (
       CASE 
           WHEN :table1.<sharedColumn[1]> = :table2.<sharedColumn[1]
           THEN 1
           ELSE 0
       END)/SUM(1)
  ....
   SUM (
       CASE 
           WHEN :table1.<sharedColumn[N]> = :table2.<sharedColumn[N]
           THEN 1
           ELSE 0
       END)/SUM(1)
  FROM :table1
  INNER JOIN :table2
      ON :table1.:keyField = :table2.:keyField;);

Is there any way to accomplish this in Teradata without a UDF (I don't have create function permissions). If that's the only way then I can put in a request, but I'd prefer not to if it can be avoided.

Upvotes: 1

Views: 1364

Answers (1)

JNevill
JNevill

Reputation: 50019

I usually don't write all the code out here on SO, but as I thought about this one it sounded like fun.

Like I noted in the comments to your question, you can't do this in a Macro since you can't use a macro parameter as database object. They only work for values in your data. So:

 Select * From Table Where F1= :myparam;

is cool in a Macro, but:

 Select * From Table Where :myparam = 'somevalue';

isn't allowed.


However, you can do this in a Stored Procedure or in whatever scripting language you like.

The problem is that you have two problems.

  1. You need a list of columns for your tables that then have to be used in the creation of your comparison query.
  2. You have to dynamically build your comparison query based on the list of columns and the two parameters that hold your table names and one parameter that holds your key field.

Neither of these requirements is trivial, but something like the following should do the job for you. It may need some tweaking, but I think it's close:

CREATE PROCEDURE compareTables 
(
    IN table1 varChar(50),
    IN table2 varChar(50),
    IN keyField varchar(50),
    OUT dynamicallyCreatedSQL VARCHAR(10000)
) 
DYNAMIC RESULT SETS 1

BEGIN

    DECLARE outputSQLStatement VARCHAR(10000); --variable to hold your dynamically created sql statement that will produce the record set that we are outputting form this SP
    DECLARE columnSQLStatement VARCHAR(500); --variable to hold your dynamically created sql statement that will hold the columns in Table1
    DECLARE columnName VARCHAR(30); --Variable to stick the column name that we get from the column_cursor  
    DECLARE output_cursor CURSOR WITH RETURN ONLY FOR output_statement; --The dynamically created cursor that will hold your record set produced by outputSQLStatement
    DECLARE column_cursor CURSOR FOR column_statement; --The dynamically created cursor that will hold your record set produced by columnSQLStatement

    --The start of your dynamic output sql statement:
    SET outputSQLStatement = '
                SELECT ';   

    --SQL Statement for your dynamically created cursor to get the columns for your table
    --TODO: Change "YourDatabaseHere" to your database...
    SET columnSQLStatement = 'SELECT ColumnName FROM "DBC".Columns WHERE DATABASE=''YourDatabaseHere'' AND TableName=''' || table1 || ''';';

    --Prepare the dynamically generated column SQL statement for cursor.
    Prepare column_statement FROM columnSQLStatement;

    --Open the cursor and Loop through each record
    OPEN column_cursor;
    LABEL1: 
    LOOP

        --WOAH THERE! No data was returned. Much sorries.
        -- If there is no data, this thing is going to hang...
        -- And... if there is no data, it means that your table probably isn't a table. You should check your parameters.
        IF (SQLSTATE ='02000') THEN
            LEAVE label1;
        END IF;

        --Grab the column name from the record into the variable columnName
        FETCH column_cursor INTO columnName;

        --Now we can build the meat of that sql statement       
        SET outputSQLStatement = outputSQLStatement || '
                SUM (
                   CASE 
                       WHEN ' || table1 || '."' || columnName || '" = ' || table2 || '."' || columnName || '"
                       THEN 1
                       ELSE 0
                   END)/SUM(1) as "' || columnName || '",';

    --End the loop and close the cursor
    END LOOP LABEL1;
    CLOSE column_cursor;        

    --There's going to be an extra comma in there that we have to remove before the FROM part of the SQL statement, lets get rid of that:
    SET outputSQLStatement = Substring(OutputSQLStatement FROM 1 FOR Length(OutputSQLStatement) - 1);

    --Now complete the sql statement        
    Set outputSQLStatement = outputSQLStatement || '
                FROM ' || table1 || '
                INNER JOIN ' || table2 || '
                  ON ' || table1 || '.' || keyfield || ' = ' || table2 || '.' || keyfield || ';';

    --Set the output variable to the dynamically generated sql statement for debug fun.
    Set dynamicallyCreatedSQL = outputSQLStatement;

    --And finally... execute the statement by prepping it and opening the cursor. 
    --  we don't close the cursor so that the "Dynamic Result Sets 1" catches it and returns it to whatever calls this procedure.
    PREPARE output_statement FROM outputSQLStatement;
    OPEN output_cursor;

END;

You can call this like:

CALL compareTables('table1', 'table2', 'yourkeyfield', output);

That will return two recordsets. The first one will have the dynamically created SQL statement you can use for debugging. The second will be the record set you are after.

If you don't have CREATE PROCEDURE access, then this is a wash. But, regardless, this would be the method you would have to use whether it's in a Teradata SP, bash with BTEQ, or some other scripting language like VBScript through ADO/ODBC or whatever.

I tried to comment it well so each part is explained, but there are some complicated things happening between the use of cursors for two different purposes (looping through a result set, and opening a result set for output from the procedure) and the dynamic sql being generated based on inputs and columns found in dbc.columns.

Upvotes: 1

Related Questions