Largo7321
Largo7321

Reputation: 15

SQL - IN (Select ... ) Performance issue

I know in (select...) stuff should be avoided but in my situation I couldn't find another way to do it.

Unfortunately, as the database is huge, I would like to find a better way. So could you please share your experience to optimise the query below ?

Here is an idea of the db

ID   OrderRef   Product    OrderDate   ShipmentDate    Client

1    111        T-Shirt    1/1/2018    4/1/2018        Georges
2    111        Pull-Over  1/1/2018    6/1/2018        (Unknown)
3    222        Shoes      9/1/2018    15/1/2018       Austin
4    222        T-Shirt    9/1/2018    18/1/2018       (Unknown)

What I need to retrieve is :

Here is my code

SELECT t1.OrderRef, t1.Product, t1.OrderDate, t1.Client, t4.max_date
FROM table1 as t1
RIGHT JOIN (SELECT t2.OrderRef, max(t2.ShipmentDate) as max_date
            FROM table1 as t2
            WHERE t2.OrderRef in(
                            SELECT t3.OrderRef 
                            FROM table1 as t3
                            WHERE t3.Client='Georges')
            GROUP BY t2.OrderRef) as t4 on t2.OrderRef=t1.OrderRef

The right join is there to retrieve only the OrderReference linked to Georges, and the subquery to calculate the the Latest ShipmentDate

Thanks in advance for your help, Largo

Upvotes: 0

Views: 69

Answers (2)

sticky bit
sticky bit

Reputation: 37477

I'd try to use (correlated) subqueries.

SELECT t11.orderref,
       t11.product,
       t11.orderdate,
       t11.client,
       (SELECT max(t13.shipmentdate)
               FROM table1 t13
               WHERE t13.orderref = t11.orderref) max_date
       FROM table1 t11
       WHERE t11.orderref = (SELECT t12.orderref
                                    FROM table1 t12
                                    WHERE t12.client = 'Georges');

To support the subquery getting the orderref an index like

CREATE INDEX table1_c_or
             ON table1 (client ASC,
                        orderref ASC);

should help. That subquery is "pointy" anyway.

For getting the maximum shipmentdate

CREATE INDEX table1_or_sd
             ON table1 (orderref ASC,
                        shipmentdate DESC);

should help. The optimizer should notice that the maximum needs to be retrieved only once as the orderref is always the same. That index can also be used to support the outer query as orderref is its first key.

Optionally creating an additional index like

CREATE INDEX table1_or_p_od_c
             ON table1 (orderref ASC,
                        product ASC,
                        orderdate ASC,
                        client ASC);

might even better support the outer query. But that's not a necessity I guess.

Upvotes: 0

The Impaler
The Impaler

Reputation: 48850

You seem to be retrieving a few rows only on each call. This should be pretty fast even if you have billions of rows. I would do something like:

with
a as (
  select max(orderref) as orderref
  from t
  where client = 'Georges'
),
b as (
  select t.* 
  from t
  join a on a.orderref = t.orderref
),
c as (
  select max(shipmentdate) as maxshipmentdate from b
)
select b.*, c.maxshipmentdate
from b
cross join c 

The query above should be pretty fast is you have the following indexes:

create index ix1 on t (client);
create index ix2 on t (orderref);

Upvotes: 2

Related Questions