apoellitsi
apoellitsi

Reputation: 271

Optimize query Teradata

I would appreciate it if you can help me with a problem that i have. I have this join condition :

SELECT *
FROM    
T1_STAGING.(first_table) AS STG 
JOIN T1_STAGING.(second_table) AS B
    ON 
    (
            STG.DLOF_ID_NO=B.DLOF_ID_NO_RU
    )

This simple join is taking too long to finish, more than 20 minutes. The data of each table is less than 600,000K data. i tried the following things :

I took statistics on each table. I changed the columns to be PRIMARY INDEX. I created JOIN INDEX for the second table but still nothing! The query never ends it takes 20 mins ++. This seems to be data distribution problem in the second table, but i can't do anything with the data. Please bear in mind that if i join my first_table with any other it takes only seconds.

Can you give me a suggestion to try? I need to optimize it for better performance.

Here is the explain of TERADATA: Explain SEL *
FROM
T1_STAGING.DLS_DLO_OWS_STAGE_STG AS STG JOIN T1_STAGING.DLS_ACQUISITION_STG AS B ON ( STG.DLOF_ID_NO=B.DLOF_ID_NO_RU )

1) First, we lock a distinct T1_STAGING."pseudo table" for read on a RowHash to prevent global deadlock for T1_STAGING.STG. 2) Next, we lock a distinct T1_STAGING."pseudo table" for read on a RowHash to prevent global deadlock for T1_STAGING.B. 3) We lock T1_STAGING.STG for read, and we lock T1_STAGING.B for read. 4) We execute the following steps in parallel. 1) We do an all-AMPs RETRIEVE step from T1_STAGING.B by way of an all-rows scan with no residual conditions split into Spool 2 (all_amps) with a condition of ("DLOF_ID_NO_RU IN (:)") to qualify skewed rows and Spool 3 (all_amps) with a condition of ("DLOF_ID_NO_RU IN (:)") to qualify rows matching skewed rows of the skewed relation and Spool 4 (all_amps) with remaining rows fanned out into 2 hash join partitions. Spool 2 is built locally on the AMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated with high confidence to be 303 rows. Spool 3 is built locally on the AMPs. The size of Spool 3 is estimated with high confidence to be 4,710 rows. Spool 4 is redistributed by hash code to all AMPs. The size of Spool 4 is estimated with high confidence to be 97,742 rows. The estimated time for this step is 1.27 seconds. 2) We do an all-AMPs RETRIEVE step from T1_STAGING.STG by way of an all-rows scan with no residual conditions split into Spool 6 (all_amps) with a condition of ("DLOF_ID_NO IN (:)") to qualify skewed rows and Spool 5 (all_amps) with a condition of ("DLOF_ID_NO IN (:)") to qualify rows matching skewed rows of the skewed relation and Spool 7 (all_amps) with remaining rows fanned out into 2 hash join partitions. Spool 6 is built locally on the AMPs. The size of Spool 6 is estimated with high confidence to be 21,587 rows. Spool 5 is built locally on the AMPs. The size of Spool 5 is estimated with high confidence to be 7 rows. Spool 7 is redistributed by hash code to all AMPs. The size of Spool 7 is estimated with high confidence to be 301,682 rows. The estimated time for this step is 4.20 seconds. 5) We execute the following steps in parallel. 1) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of an all-rows scan into Spool 8 (all_amps), which is duplicated on all AMPs. Then we do a SORT to order Spool 8 by the hash code of (T1_STAGING.STG.DLOF_ID_NO). The size of Spool 8 is estimated with high confidence to be 336 rows ( 640,080 bytes). The estimated time for this step is 0.01 seconds. 2) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 9 (all_amps), which is duplicated on all AMPs. The result spool file will not be cached in memory. The size of Spool 9 is estimated with high confidence to be 226,080 rows (391,796,640 bytes). The estimated time for this step is 1.05 seconds. 6) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of a RowHash match scan, which is joined to Spool 2 (Last Use) by way of a RowHash match scan. Spool 8 and Spool 2 are joined using a merge join, with a join condition of ("DLOF_ID_NO = DLOF_ID_NO_RU"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with low confidence to be 2,121 rows (11,491,578 bytes). The estimated time for this step is 0.03 seconds. 7) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an all-rows scan, which is joined to Spool 9 (Last Use) by way of an all-rows scan. Spool 6 and Spool 9 are joined using a single partition hash join, with a join condition of ("DLOF_ID_NO = DLOF_ID_NO_RU"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with low confidence to be 9,243,161 rows (50,079,446,298 bytes). The estimated time for this step is 0.60 seconds. 8) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an all-rows scan, which is joined to Spool 7 (Last Use) by way of an all-rows scan. Spool 4 and Spool 7 are joined using a hash join of 2 partitions, with a join condition of ("DLOF_ID_NO = DLOF_ID_NO_RU"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with low confidence to be 731,525 rows (3,963,402,450 bytes). The estimated time for this step is 0.96 seconds. 9) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 6.84 seconds.

Upvotes: 0

Views: 1275

Answers (1)

avnays
avnays

Reputation: 1

600,000K you mean 600M, right? ;) That's not that little.

First. The columns used in JOIN should be indexed. (Looks like you've done it) 2. Add WHERE condition to choose specific values you need 3. Add LIMIT to limit the SELECT result

And finally, use EXPLAIN SELECT ... to understand the issue.

Upvotes: 0

Related Questions