TechArch12
TechArch12

Reputation: 23

SQL Server Greatest n per Group Query: Customer Information with Latest Date

Primer: I have looked through several examples on Stack overflow and many got me close to what I needed but not exactly what I needed. Please bare with and also the one thing which I've found unhelpful is many of the answers have thrown a solution without explanation to how it is working and why parts are needed. So if the answer can include explanation and breakdown of the query that would be great. Thank you.

Problem: I have a Customer table as shown below.

Id     Name     Account     Active
1      Bob      ABC         True
2      Jenny    BED         True
3      Tony     POT         False
4      David    DON         False

I Also have an Order table. See below

OrderId     CustomerId     OrderDate     Status
1           2              2016-04-01    3
2           2              2016-09-05    7
3           2              2017-02-20    5
4           3              2015-02-20    8
5           4              2017-04-16    3

What I need: So I need to have a table which has the Customer.Id., Customer.Name, Customer.Account, Customer.Active, Order.Status AND the Latest OrderDate which has a status of either 3 or 7. The table should show all customers and a null value for Order.Status and LatestDate where the Customer has either never ordered OR their order doesn't fit the status criteria. See below for an example.

CustomerId     Name     Account     Active     Status     LatestDate
1              Bob      ABC         True       NULL       NULL
2              Jenny    BED         True       7          2016-09-05
3              Tony     POT         False      NULL       NULL
4              David    DON         False      3          2017-04-16

So as you can see both Bob and Tony have a status and latestDate of null because they have either not made a transaction or they do not meet the criteria of the status for the Order to be 3 or 7. David does and only has one order so his latest is simple. Jenny has multiple orders and multiple which fit the criteria and so it takes the order with the latest date and adds that to our table (still taking into account whether the status is 3 or 7).

Now in previous attempts at this issue I've ended up with all transactions being taken in to account (not just 3 and 7's), I've had duplicate customers shown (couldn't use distinct due to the date differing), and in another attempt it got all the data but ignored anything which had a null value. As previously stated I've seen plenty of examples on Stack overflow, but they all seem to give an issue or problem to my case and don't offer enough explanation or breakdown for me to be able to modify it to my cause.

So if someone can offer a solution to this issue as well as a breakdown of the SQL just for my clear understanding (and that of others). Thanks in advance.

My latest attempt was:

Select Distinct c.Id, c.[Name], c.Account, c.Active, o.Status, 
LatestDates.LatestDate
From Customer as c
Left Outer Join
(Select CustomerId, Max(OrderDate) LatestDate From [Order] Group By 
CustomerId) LatestDates On c.Id = LatestDates.CustomerId
Left Join [Order] o on LatestDates.CustomerId = o.CustomerId and 
LatestDates.LatestDate = o.OrderDate
Where c.Active is not null
Order by c.Active

However this solution did not take into account the 3 and 7 check and I did not understand it enough to confirm that the data I am getting is correct. This was taken from a differing stack overflow question. Let me know if you need any more details.

Upvotes: 2

Views: 109

Answers (2)

SqlZim
SqlZim

Reputation: 38023

Using top with ties with row_number()

select top 1 with ties
    c.Id, c.[Name], c.Account, c.Active, o.Status, o.OrderDate
from Customer as c
  left join [Order] o
    on c.Id= o.CustomerId
   and o.Status in (3,7)
where c.Active is not null
order by row_number() over (partition by c.Id order by o.OrderDate desc)

or with outer apply()

select 
  c.Id, c.[Name], c.Account, c.Active, o.Status, o.OrderDate
from Customer as c
  outer apply (
    select top 1
      i.Status, i.OrderDate
    from [Order] i
    where i.CustomerId = c.Id
      and i.Status in (3,7)
    order by i.OrderDate desc
  ) as o

or adjusting your query attempt by including the status criteria in the derived table query:

Select Distinct c.Id, c.[Name], c.Account, c.Active, o.Status, LatestDates.LatestDate
From Customer as c
  Left Outer Join (
      Select CustomerId, Max(OrderDate) as LatestDate 
      From [Order] i
      where i.Status in (3,7)
      Group By CustomerId
      ) LatestDates On c.Id = LatestDates.CustomerId
  Left Join [Order] o 
    on LatestDates.CustomerId = o.CustomerId 
   and LatestDates.LatestDate = o.OrderDate
   and o.Status in (3,7)
Where c.Active is not null
Order by c.Active

rextester demo for all 3: http://rextester.com/WNVKL52324

all 3 return:

+----+-------+---------+--------+--------+------------+
| Id | Name  | Account | Active | Status | LatestDate |
+----+-------+---------+--------+--------+------------+
|  1 | Bob   | ABC     | True   | NULL   | NULL       |
|  2 | Jenny | BED     | True   | 7      | 2016-09-05 |
|  3 | Tony  | POT     | False  | NULL   | NULL       |
|  4 | David | DON     | False  | 3      | 2017-04-16 |
+----+-------+---------+--------+--------+------------+

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269763

Just use row_number():

select c.*, o.status, o.latestdate
from Customer c left join
     (select o.*,
             row_number() over (partition by o.customerId order by o.orderDate desc) as seqnum
      from orders o
     ) o
     on o.customerId = c.customerId
where seqnum = 1 and c.Active is not null
order by c.Active;

Upvotes: 1

Related Questions