Lenwood
Lenwood

Reputation: 1404

Improve Netezza SQL Query That Contains Hundreds of Strings in WHERE Clause

I have a Netezza query with a WHERE clause that includes several hundred potential strings. I'm surprised that it runs, but it takes time to complete and occasionally errors out ('transaction rolled back by client'). Here's a pseudo code version of my query.

SELECT
    TO_CHAR(X.I_TS, 'YYYY-MM-DD') AS DATE,
    X.I_SRC_NM AS CHANNEL,
    X.I_CD AS CODE,
    COUNT(DISTINCT CASE WHEN X.I_FLG = 1 THEN X.UID ELSE NULL) AS WIDGETS

FROM
    (SELECT
        A.I_TS,
        A.I_SRC_NM,
        A.I_CD,
        B.UID,
        B.I_FLG

    FROM
        SCHEMA.DATABASE.TABLE_A A
        LEFT JOIN SCHEMA.DATABASE.TABLE_B B ON A.UID = B.UID

    WHERE
        A.I_TS BETWEEN '2017-01-01' AND '2017-01-15'
        AND B.TAB_CODE IN ('00AV', '00BX', '00C2', '00DJ'...
                           ...
                           ...
                           ...
                           ...
                           ...
                           ...
                           ...)
    ) X

GROUP BY
    X.I_TS,
    X.I_SRC_NM,
    X.I_CD
;

In my query, I'm limiting the results on B.TAB_CODE to about 1,200 values (out of more than 10k). I'm honestly surprised that it works at all, but it does most of the time.

Is there a more efficient way to handle this?

Upvotes: 1

Views: 614

Answers (3)

Lars G Olsen
Lars G Olsen

Reputation: 1118

  1. If im guessing correctly , the X.I_TS is in fact a ‘timestamp’, and as such i expect it to contain many different values per day. Can you confirm that? If I’m right the query can possibly benefit from changing the ‘group by X.I._TS,...’ to ‘group by 1,...’

  2. Furthermore the ‘Count(Distinct Case...’ can never return anything else than 1 or NULL. Can you confirm that? If I’m right on that, you can get rid of the expensive ‘DISTINCT’ by changing it to ‘MAX(Case...’

Can you follow me :)

Upvotes: 0

joebeeson
joebeeson

Reputation: 4366

We've seen situations where it's "cheaper" to CTAS the original table to another, distributed on your primary condition, and then querying that table instead.

Upvotes: 1

Stavr00
Stavr00

Reputation: 3314

If the IN clause becomes too cumbersome, you can make your query in multiple parts. Create a temporary table containing a TAB_CODE set then use it in a JOIN.

WITH tab_codes(tab_code) AS (
 SELECT '00AV' 
 UNION ALL
 SELECT '00BX' 
 --- etc ---
)
SELECT
  TO_CHAR(X.I_TS, 'YYYY-MM-DD') AS DATE,
  X.I_SRC_NM AS CHANNEL,
  --- etc ---
  INNER JOIN tab_codes Q ON B.TAB_CODES = Q.tab_code

If you want to boost performance even more, consider using a real temporary table (CTAS)

Upvotes: 3

Related Questions