Reputation: 733
I am using Teradata to do the following. Say I have the following Table1
col1 col2 col3 col4 col5
1 A NULL NULL D
2 B NULL NULL C
3 A B NULL D
4 A B C D
and Table2
col1 col2 col3 col4
1 A D 27
2 B C 334
3 A B 434
4 B D 100
5 C D 200
I want to join Table1 and Table2 (will be 3 times) so that I can create a table like this
col1 col2 col3 col4 col5 col_val_1 col_val_2 col_val_3
1 A NULL NULL D 27 NULL NULL
2 B NULL NULL C 334 NULL NULL
3 A B NULL D 434 100 NULL
4 A B C D 434 334 200
I can create this table by the following code
select tab1.*, tab2_1.col4 as col_val_1, tab2_2.col4 as col_cal_2, tab2_3.col4 as col_val_3
from Table1 tab1
left outer join Table2 tab2_1
on tab2_1.col2 = tab1.col2
and tab2_1.col3 = coalesce(tab1.col3,tab1.col5) /* if col3 is Null then join on col5. I want to calculate pair wise value. If col3 is NULL, the pair is col2-col5.*/
left outer join Table2 tab2_2
on tab2_2.col2 = coalesce(tab1.col3,0)
and tab2_2.col3 = coalesce(tab1.col4, tab1.col5)
left outer join Table2 tab2_3
on tab2_3.col2 = coalesce(tab1.col4,0)
and tab2_3.col3 = tab1.col5
The Table1 data is such that if col3 is null col4 will be null. col2 and col5 are never null. So if col3 is null I will have col2-col5. If col3 is not null and col4 is null then I will have col2-col3, col3-col5. If nothing is null then I will have col2-col3, col3-col4, col4-col5.
This query runs for a small table and gives the desired output. However, this is a complex query. I ran EXPLAIN
on this and the estimated runtime is in >10^5
hours. I was wondering if there is a way. This query can be optimized.
Upvotes: 1
Views: 1266
Reputation: 3781
In general you really want to avoid functions in join conditions. It can be symptomatic of design flaws if you can't get around it. This is a bizarre way to have to join two tables.
However, if that ball isn't in your court and you are stuck with it, this might have better performance. My hunch is that a product join is happening somewhere.
It's really heavy handed though.
CREATE MULTISET VOLATILE TABLE Table1 AS tab1
( SELECT col1,
col2,
col3,
col4,
col5,
COALESCE(tab1.col3,tab1.col5) AS col_35,
COALESCE(tab1.col4,tab1.col5) AS col_45,
COALESCE(tab1.col3,0) AS col_30,
COALESCE(tab1.col4,0) AS col_40
FROM Table1
)
WITH DATA PRIMARY INDEX (col2) ON COMMIT PRESERVE ROWS;
SELECT tab1.*,
tab2_1.col4 AS col_val_1,
tab2_2.col4 AS col_cal_2,
tab2_3.col4 AS col_val_3
FROM tab1
LEFT
JOIN Table2 AS tab2_1
ON tab2_1.col2 = tab1.col2
AND tab2_1.col3 = tab1.col_35
LEFT
JOIN Table2 AS tab2_2
ON tab2_2.col2 = tab1.col_30
AND tab2_2.col3 = tab1.col_45
LEFT
JOIN Table2 AS tab2_3
ON tab2_3.col2 = tab1.col_40
AND tab2_3.col3 = tab1.col_5;
Upvotes: 2