Darshan P
Darshan P

Reputation: 2241

SQL Server Performance when sorting with left join

I have a very simple query that is taking way to long to return the top 20 rows.

SELECT    o.OrderId, 
          p.PaymentDate

FROM      dbo.Orders o                              -- 6 million records

LEFT JOIN dbo.Payments p ON p.OrderId = o.OrderId   -- 3.5 million records

WHERE     o.IsDeleted = 0                           -- There is an index on this column.

ORDER BY  p.PaymentDate DESC                        -- There is an index on this column and on OrderId
OFFSET 0 ROWS FETCH NEXT 20 ROWS ONLY

The execution plan shows an index scan on both Orders and Payments table with 6 million rows read from Orders and 3 million rows read from the payments table.

It is taking around 4-5 seconds to return 20 rows.

How can I make this query return result faster?

Upvotes: 1

Views: 588

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269743

You are sorting by the payment date -- so it wuld seem that you need a match. Hence, an inner join is sufficient.

In addition, OrderId is in the payments table. So, you can try this version:

SELECT p.OrderId, p.PaymentDate
FROM dbo.Payments p
WHERE EXISTS (SELECT 1
              FROM dbo.Orders o
              WHERE p.OrderId = o.OrderId AND
                    o.IsDeleted = 0 
             )
ORDER BY p.PaymentDate DESC
OFFSET 0 ROWS FETCH NEXT 20 ROWS ONLY;

Then for this query, you want an indexes on:

  • Payments(PaymentDate DESC, OrderId)
  • Orders(OrderId, IsDeleted).

Upvotes: 1

Caius Jard
Caius Jard

Reputation: 74605

You're sorting a huge dataset just to retrieve 20 rows

If this is a typical order system then IsDeleted=0 probably doesn't help much, because what percentage of orders would be deleted? 0.1%? Excluding them makes little difference to the number of orders you're sorting

I'd put a paymentdate criteria in there WHERE too, cut the record count down a bit so that a much smaller set is being sorted. The logic you're after at the moment is that you want 20 records, but you don't need to sort 6 million records to get your 20. Cutting it down to 20 would do. This is clearly a busy ordering system, and probably at least 20 orders a day are being submitted so you may find you get your needed 20 by selecting only records with a paymentdate since one day ago, but pick a good number of days to give your 20, and maybe even implement some logic where your front end app increases the number of days if you don't get your 20 first time round. You can cache this lookback value in the front end and decrease it again as the days go by, so it becomes an automatically derived figure

The other issue I think you have is that you've made it a LEFT JOIN, yet I can't see the sense in doing so- you're after the most recently paid for orders, right? And there is no payment record for unpaid orders? Whatever the reason why there are only 3.5 million payments for 6 million orders, you don't seem to be interested in the orders that have no payment record, so I can't see the sense in having it be a left join that includes and sorts 2.5 million records you definitely don't want. Make it an inner join

Upvotes: 2

Related Questions