Reputation: 1404
I have a Netezza query with a WHERE clause that includes several hundred potential strings. I'm surprised that it runs, but it takes time to complete and occasionally errors out ('transaction rolled back by client'). Here's a pseudo code version of my query.
SELECT
TO_CHAR(X.I_TS, 'YYYY-MM-DD') AS DATE,
X.I_SRC_NM AS CHANNEL,
X.I_CD AS CODE,
COUNT(DISTINCT CASE WHEN X.I_FLG = 1 THEN X.UID ELSE NULL) AS WIDGETS
FROM
(SELECT
A.I_TS,
A.I_SRC_NM,
A.I_CD,
B.UID,
B.I_FLG
FROM
SCHEMA.DATABASE.TABLE_A A
LEFT JOIN SCHEMA.DATABASE.TABLE_B B ON A.UID = B.UID
WHERE
A.I_TS BETWEEN '2017-01-01' AND '2017-01-15'
AND B.TAB_CODE IN ('00AV', '00BX', '00C2', '00DJ'...
...
...
...
...
...
...
...)
) X
GROUP BY
X.I_TS,
X.I_SRC_NM,
X.I_CD
;
In my query, I'm limiting the results on B.TAB_CODE
to about 1,200 values (out of more than 10k). I'm honestly surprised that it works at all, but it does most of the time.
Is there a more efficient way to handle this?
Upvotes: 1
Views: 614
Reputation: 1118
If im guessing correctly , the X.I_TS is in fact a ‘timestamp’, and as such i expect it to contain many different values per day. Can you confirm that? If I’m right the query can possibly benefit from changing the ‘group by X.I._TS,...’ to ‘group by 1,...’
Furthermore the ‘Count(Distinct Case...’ can never return anything else than 1 or NULL. Can you confirm that? If I’m right on that, you can get rid of the expensive ‘DISTINCT’ by changing it to ‘MAX(Case...’
Can you follow me :)
Upvotes: 0
Reputation: 4366
We've seen situations where it's "cheaper" to CTAS the original table to another, distributed on your primary condition, and then querying that table instead.
Upvotes: 1
Reputation: 3314
If the IN
clause becomes too cumbersome, you can make your query in multiple parts. Create a temporary table containing a TAB_CODE set then use it in a JOIN
.
WITH tab_codes(tab_code) AS (
SELECT '00AV'
UNION ALL
SELECT '00BX'
--- etc ---
)
SELECT
TO_CHAR(X.I_TS, 'YYYY-MM-DD') AS DATE,
X.I_SRC_NM AS CHANNEL,
--- etc ---
INNER JOIN tab_codes Q ON B.TAB_CODES = Q.tab_code
If you want to boost performance even more, consider using a real temporary table (CTAS
)
Upvotes: 3