Reputation: 2450
for my SQL Server 2016 project I have an Orders table looks like the one below and I want to create a SQL query that shows the oldest order for each customer / product. There are thousands of orders in the Orders table today and I should expect this to grow in size so I want this to perform well.
The goal is the output to look like this:
OrderID | CustomerID | ProductID | OrderDt | OrderAmt |
---|---|---|---|---|
123 | 1 | 1 | 1/1/2021 | $50 |
456 | 1 | 2 | 1/2/2021 | $20 |
345 | 2 | 1 | 1/1/2021 | $30 |
The data in the Orders table today look like this:
OrderID | CustomerID | ProductID | OrderDt | OrderAmt |
---|---|---|---|---|
123 | 1 | 1 | 1/1/2021 | $50 |
758 | 1 | 1 | 1/2/2021 | $80 |
563 | 1 | 2 | 1/3/2021 | 74 |
684 | 1 | 2 | 1/4/2021 | 23 |
456 | 1 | 2 | 1/2/2021 | $20 |
345 | 2 | 1 | 1/1/2021 | $30 |
Upvotes: 0
Views: 759
Reputation: 1269763
The canonical method is to use row_number()
:
select t.*
from (select t.*,
row_number() over (partition by customerid, productid order by orderdt, orderid) as seqnum
from t
) t
where seqnum = 1;
With an index on (customerid, productid, orderdt)
, then a correlated subquery might be a smidgen faster:
select t.*
from t
where t.orderdt = (select min(t2.orderdt)
from t t2
where t2.productid = t.productid and t2.customerid = t.customerid
);
Or a slightly less performance method without subqueries:
select top (1) with ties t.*
from t
order by row_number() over (partition by productid, customerid order by orderdt);
Upvotes: 1