kevin.w.johnson
kevin.w.johnson

Reputation: 1794

Redshift - window function - Get stats over previous hour for each row

I'm trying to write a query for Redshift based on product orders. The table contains columns like store_id, order_number, order_datetime, products_ordered, order_time. The query I'm trying to write would select from this table, and for each row it would include some stats based on the previous hour of orders at this store.

Currently I can do something like:

SELECT store_id, order_number, order_datetime, products_ordered, order_time,
       (SELECT COUNT(*) FROM mtable WHERE store_id=o.store_id AND order_time BETWEEN (o.order_time - interval '1 hour') AND o.order_time) as prev_num_orders,
       (SELECT AVG(products_ordered) FROM mtable WHERE store_id=o.store_id AND order_time BETWEEN (o.order_time - interval '1 hour') AND o.order_time) as prev_avg_orders
FROM mtable o;

The performance on this is horrible. One of the main causes is likely that I have to look up the previous hours' orders twice to get the two different stats. Is there a way to optimize this? I think there should be a window function but I'm not sure.

Upvotes: 0

Views: 282

Answers (2)

Edgars T.
Edgars T.

Reputation: 1149

I can't think of valid window range for this case, as value range is only common factor. As Redshift quite good at large datasets I would suggest following solution:

SELECT store_id, 
       order_number, 
       order_datetime, 
       products_ordered, 
       order_time, 
       COUNT(prev_orders.store_id) prev_num_orders,
       AVG(prev_orders.products_ordered)  prev_avg_orders
FROM mtable o
left join mtable prev_orders on prev_orders.store_id=o.store_id 
                            AND prev_orders.order_time BETWEEN (o.order_time - interval '1 hour') AND o.order_time 
--and o.order_number != prev_orders.order_number
group by store_id, 
       order_number, 
       order_datetime, 
       products_ordered, 
       order_time;

Note that prev_num_orders and prev_avg_orders column statistic will also include current order as well. To eliminate current order from statistics un-comment order number comparision line from SQL statement.

Upvotes: 1

zip
zip

Reputation: 4061

I dont have the data to test de performance, but when I faced a similar issue on Redshift this is what I did:

    with cte as
    (
    SELECT store_id, order_number, order_datetime, products_ordered, order_time,
    LAG (products_ordered,1) OVER (PARTITION BY store_id ORDER BY order_time) AS prev_products_ordered
    from mtable
    )
    select store_id, order_number, order_datetime, products_ordered, order_time,
    count(*) as prev_num_orders, avg(prev_products_ordered) as prev_avg_orders 
from cte
    group by store_id, order_number, order_datetime, products_ordered, order_time

Upvotes: 0

Related Questions