Simon1979
Simon1979

Reputation: 2108

Force Redshift to Assess Specific Predicate First

I have a small application which on a daily schedule executes a series of SQL scripts against a Redshift database and populate tables with aggregated data ready for the customer to extract. The scripts are held in text files and can be easily updated, the SQL is extracted from file '|businessday|' replaced with the required date eg '20180501'. There is no robust logic to base the date on the current calendar date.

Requirements from the customer have changed and now there are two scripts which only need to populate the tables on the last day of the month. I can update the script so the predicate reads:

WHERE (SELECT businessday FROM bd) = LAST_DAY((SELECT businessday FROM bd))

Where bd is a CTE so I can cast the date string as DATE.

While this correctly returns no records it only takes marginally less time to execute as when I run it for the whole month - it takes over a minute to return 0 rows. I would expect it to quickly identify this predicate as failing and return no rows almost instantly.

Is there a way to restructure the SQL to assess this predicate first?

My understanding is that you cannot use procedural IF statements in Redshift so I am limited to adding predicates in the SQL string.

I have tried adding a second CTE which returns no predicates on the businessday columns of the key tables:

WITH bd as (SELECT CAST('20180425' as date) as businessday WHERE 
    (SELECT CAST('20180425' as date)) = LAST_DAY(( CAST('20180425' as date)))
...
WHERE ts.businessday in (select businessday from bd)

(this would need modification to get what i need but the principle does not seem to work)

Simplified SQL string (couple of tables and columns removed):

with cte as (select storeid from ttl_store_processed where
        businessday = '20180425'),
    bd as (SELECT CAST('20180425' as date) as businessday 
        WHERE (SELECT CAST('20180425' as date)) = LAST_DAY(( CAST('20180425' as date))))
SELECT store.storenumber AS COST_CENTER,
     TO_CHAR(DATE(tii.BusinessDay), 'YYYYMM') AS YEAR_MONTH,
     ii.ItemCode AS MATERIAL_NUMBER,
     SUM(tii.Quantity) AS UNITS
FROM cte s
    inner join transactionsale ts 
        on s.storeid = ts.storeid
    inner join Store store 
        on ts.storeid = store.storeid
    inner join transactionsaleitem tsi 
        on ts.transactionsaleid = tsi.transactionsaleid
    inner join transactioninventoryitem tii 
        on tsi.transactionsaleitemid = tii.transactionsaleitemid
    inner join inventoryitem ii 
        on tii.inventoryitemid = ii.inventoryitemid
WHERE (SELECT businessday FROM bd) = LAST_DAY((SELECT businessday FROM bd))
    AND ts.storeid IN (SELECT storeid FROM cte)
    AND ts.businessday BETWEEN DATE_TRUNC('MONTH', (SELECT businessday FROM bd)) 
        AND LAST_DAY((SELECT businessday FROM bd))
GROUP BY 
     store.storenumber,
     TO_CHAR(DATE(tii.BusinessDay), 'YYYYMM'),
     ii.ItemCode;

cte currently returns ~20 stores but this will increase to potentially 180+. I have tried applying logic so this table is empty:

with cte as (select storeid from mcdonaldshk.ttl_store_processed 
    where businessday = '20180425' and (SELECT CAST('20180425' as date)) 
        = LAST_DAY(( CAST('20180425' as date))))

This does not appear to work either

Upvotes: 0

Views: 401

Answers (1)

John Rotenstein
John Rotenstein

Reputation: 269322

So, you're basically saying that you'd like it to run really fast when (SELECT businessday FROM bd) = LAST_DAY((SELECT businessday FROM bd)) is false, by making it evaluate that first?

You could try joining your query to a sub-query:

JOIN (SELECT 'end of month'
      FROM bd
      WHERE businessday = LAST_DAY(businessday)
      ) lastday ON (true)

This way, if it is NOT the last day, it returns zero rows, so there are no rows to join to. If this gets evaluated first, the rest of the query won't be executed because there's no rows to JOIN.


By the way, you can also simplify some code:

WHERE (SELECT CAST('20180425' as date)) = LAST_DAY(( CAST('20180425' as date)))

can simply be:

WHERE ('20180425'::date) = LAST_DAY('20180425'::date)

Also, if you add a JOIN to bd, then you can simplify

ts.businessday BETWEEN 
    DATE_TRUNC('MONTH', (SELECT businessday FROM bd)) 
    AND LAST_DAY((SELECT businessday FROM bd))

into:

ts.businessday BETWEEN 
    DATE_TRUNC('MONTH', businessday) 
    AND LAST_DAY(businessday)

Upvotes: 1

Related Questions