deepAgrawal
deepAgrawal

Reputation: 733

Optimize Teradata Query multiple table conditional join

I am using Teradata to do the following. Say I have the following Table1

col1  col2 col3 col4   col5
1     A    NULL  NULL  D
2     B    NULL  NULL  C    
3     A    B     NULL  D
4     A    B     C     D

and Table2

col1  col2 col3 col4
1     A    D    27
2     B    C    334    
3     A    B    434
4     B    D    100
5     C    D    200

I want to join Table1 and Table2 (will be 3 times) so that I can create a table like this

col1  col2 col3 col4   col5  col_val_1   col_val_2 col_val_3
1     A    NULL  NULL  D     27           NULL      NULL
2     B    NULL  NULL  C     334          NULL      NULL
3     A    B     NULL  D     434          100       NULL
4     A    B     C     D     434          334       200

I can create this table by the following code

select tab1.*, tab2_1.col4 as col_val_1, tab2_2.col4 as col_cal_2, tab2_3.col4 as col_val_3
from Table1 tab1
left outer join Table2 tab2_1
on tab2_1.col2 = tab1.col2
and tab2_1.col3 = coalesce(tab1.col3,tab1.col5) /* if col3 is Null then join on col5. I want to calculate pair wise value. If col3 is NULL, the pair is col2-col5.*/
left outer join Table2 tab2_2
on tab2_2.col2 = coalesce(tab1.col3,0)
and tab2_2.col3 = coalesce(tab1.col4, tab1.col5)
left outer join Table2 tab2_3
on tab2_3.col2 = coalesce(tab1.col4,0)
and tab2_3.col3 = tab1.col5

The Table1 data is such that if col3 is null col4 will be null. col2 and col5 are never null. So if col3 is null I will have col2-col5. If col3 is not null and col4 is null then I will have col2-col3, col3-col5. If nothing is null then I will have col2-col3, col3-col4, col4-col5.

This query runs for a small table and gives the desired output. However, this is a complex query. I ran EXPLAIN on this and the estimated runtime is in >10^5 hours. I was wondering if there is a way. This query can be optimized.

Upvotes: 1

Views: 1266

Answers (1)

Error_2646
Error_2646

Reputation: 3781

In general you really want to avoid functions in join conditions. It can be symptomatic of design flaws if you can't get around it. This is a bizarre way to have to join two tables.

However, if that ball isn't in your court and you are stuck with it, this might have better performance. My hunch is that a product join is happening somewhere.

It's really heavy handed though.

CREATE MULTISET VOLATILE TABLE Table1 AS tab1
  ( SELECT col1,
           col2,
           col3,
           col4,
           col5,
           COALESCE(tab1.col3,tab1.col5) AS col_35,
           COALESCE(tab1.col4,tab1.col5) AS col_45,
           COALESCE(tab1.col3,0) AS col_30,
           COALESCE(tab1.col4,0) AS col_40
      FROM Table1
  )
WITH DATA PRIMARY INDEX (col2) ON COMMIT PRESERVE ROWS;

SELECT tab1.*, 
       tab2_1.col4 AS col_val_1, 
       tab2_2.col4 AS col_cal_2, 
       tab2_3.col4 AS col_val_3
  FROM tab1
  LEFT
  JOIN Table2 AS tab2_1
    ON tab2_1.col2 = tab1.col2
   AND tab2_1.col3 = tab1.col_35
  LEFT
  JOIN Table2 AS tab2_2
    ON tab2_2.col2 = tab1.col_30
   AND tab2_2.col3 = tab1.col_45   
  LEFT
  JOIN Table2 AS tab2_3
    ON tab2_3.col2 = tab1.col_40
   AND tab2_3.col3 = tab1.col_5;

Upvotes: 2

Related Questions