Reputation: 3781
I'm looking to write a query, to be used in a macro that compares two tables with the same column names and outputs the percent match of each field. I'd like the macro to take the table names as input.
ex for two static tables.
SELECT
SUM(CASE WHEN table1.field1 = table2.field1 THEN 1 ELSE 0 END)/SUM(1)
,SUM(CASE WHEN table1.field2 = table2.field2 THEN 1 ELSE 0 END)/SUM(1)
,SUM(CASE WHEN table1.field3 = table2.field3 THEN 1 ELSE 0 END)/SUM(1)
....
....
,SUM(CASE WHEN table1.fieldN = table2.fieldN THEN 1 ELSE 0 END)/SUM(1)
FROM table1
INNER JOIN table2
ON table1.keyField = table2.keyField
Is it possible to write a macro which generalizes this?
For example, the psuedo query might look something like this:
CREATE MACRO compareTables (table1 varChar(50),table2 varChar(50),keyField AS varchar(50)) AS (
WITH sharedColumns (columnName) AS (
SELECT columnname
FROM dbc.columns
WHERE
tableName = :table1
INTERSECT
SELECT columnname
FROM dbc.columns
WHERE
tableName = :table2)
SELECT
SUM (
CASE
WHEN :table1.<sharedColumn[1]> = :table2.<sharedColumn[1]
THEN 1
ELSE 0
END)/SUM(1)
....
SUM (
CASE
WHEN :table1.<sharedColumn[N]> = :table2.<sharedColumn[N]
THEN 1
ELSE 0
END)/SUM(1)
FROM :table1
INNER JOIN :table2
ON :table1.:keyField = :table2.:keyField;);
Is there any way to accomplish this in Teradata without a UDF (I don't have create function permissions). If that's the only way then I can put in a request, but I'd prefer not to if it can be avoided.
Upvotes: 1
Views: 1364
Reputation: 50019
I usually don't write all the code out here on SO, but as I thought about this one it sounded like fun.
Like I noted in the comments to your question, you can't do this in a Macro since you can't use a macro parameter as database object. They only work for values in your data. So:
Select * From Table Where F1= :myparam;
is cool in a Macro, but:
Select * From Table Where :myparam = 'somevalue';
isn't allowed.
However, you can do this in a Stored Procedure or in whatever scripting language you like.
The problem is that you have two problems.
Neither of these requirements is trivial, but something like the following should do the job for you. It may need some tweaking, but I think it's close:
CREATE PROCEDURE compareTables
(
IN table1 varChar(50),
IN table2 varChar(50),
IN keyField varchar(50),
OUT dynamicallyCreatedSQL VARCHAR(10000)
)
DYNAMIC RESULT SETS 1
BEGIN
DECLARE outputSQLStatement VARCHAR(10000); --variable to hold your dynamically created sql statement that will produce the record set that we are outputting form this SP
DECLARE columnSQLStatement VARCHAR(500); --variable to hold your dynamically created sql statement that will hold the columns in Table1
DECLARE columnName VARCHAR(30); --Variable to stick the column name that we get from the column_cursor
DECLARE output_cursor CURSOR WITH RETURN ONLY FOR output_statement; --The dynamically created cursor that will hold your record set produced by outputSQLStatement
DECLARE column_cursor CURSOR FOR column_statement; --The dynamically created cursor that will hold your record set produced by columnSQLStatement
--The start of your dynamic output sql statement:
SET outputSQLStatement = '
SELECT ';
--SQL Statement for your dynamically created cursor to get the columns for your table
--TODO: Change "YourDatabaseHere" to your database...
SET columnSQLStatement = 'SELECT ColumnName FROM "DBC".Columns WHERE DATABASE=''YourDatabaseHere'' AND TableName=''' || table1 || ''';';
--Prepare the dynamically generated column SQL statement for cursor.
Prepare column_statement FROM columnSQLStatement;
--Open the cursor and Loop through each record
OPEN column_cursor;
LABEL1:
LOOP
--WOAH THERE! No data was returned. Much sorries.
-- If there is no data, this thing is going to hang...
-- And... if there is no data, it means that your table probably isn't a table. You should check your parameters.
IF (SQLSTATE ='02000') THEN
LEAVE label1;
END IF;
--Grab the column name from the record into the variable columnName
FETCH column_cursor INTO columnName;
--Now we can build the meat of that sql statement
SET outputSQLStatement = outputSQLStatement || '
SUM (
CASE
WHEN ' || table1 || '."' || columnName || '" = ' || table2 || '."' || columnName || '"
THEN 1
ELSE 0
END)/SUM(1) as "' || columnName || '",';
--End the loop and close the cursor
END LOOP LABEL1;
CLOSE column_cursor;
--There's going to be an extra comma in there that we have to remove before the FROM part of the SQL statement, lets get rid of that:
SET outputSQLStatement = Substring(OutputSQLStatement FROM 1 FOR Length(OutputSQLStatement) - 1);
--Now complete the sql statement
Set outputSQLStatement = outputSQLStatement || '
FROM ' || table1 || '
INNER JOIN ' || table2 || '
ON ' || table1 || '.' || keyfield || ' = ' || table2 || '.' || keyfield || ';';
--Set the output variable to the dynamically generated sql statement for debug fun.
Set dynamicallyCreatedSQL = outputSQLStatement;
--And finally... execute the statement by prepping it and opening the cursor.
-- we don't close the cursor so that the "Dynamic Result Sets 1" catches it and returns it to whatever calls this procedure.
PREPARE output_statement FROM outputSQLStatement;
OPEN output_cursor;
END;
You can call this like:
CALL compareTables('table1', 'table2', 'yourkeyfield', output);
That will return two recordsets. The first one will have the dynamically created SQL statement you can use for debugging. The second will be the record set you are after.
If you don't have CREATE PROCEDURE
access, then this is a wash. But, regardless, this would be the method you would have to use whether it's in a Teradata SP, bash with BTEQ, or some other scripting language like VBScript through ADO/ODBC or whatever.
I tried to comment it well so each part is explained, but there are some complicated things happening between the use of cursors for two different purposes (looping through a result set, and opening a result set for output from the procedure) and the dynamic sql being generated based on inputs and columns found in dbc.columns
.
Upvotes: 1