RobPio
RobPio

Reputation: 137

Query that uses min aggregate function on multiple columns with priority

I've looked around on the internet for the solution to this problem but can't seem to find exactly what I am looking for.

lets say that we have a table that looks like this:

_____________________________________________________
| id (PK) | name  |  date      | ... other data ... |
-----------------------------------------------------
|    1    |  BOB  | 2016-11-20 |      ....
--------------------------------     
|    3    |  CARL | 2015-09-02 |      ....
--------------------------------
|    4    |  BOB  | 2016-11-18 |      ....
--------------------------------
|    5    |  JON  | 2016-03-03 |      ....
--------------------------------
|    6    |  TIM  | 2016-11-24 |      ....
--------------------------------
|    7    |  TIM  | 2016-11-24 |      ....
--------------------------------
|    8    |  JON  | 2016-05-05 |      ....
--------------------------------

I need a query that will:
1. return a distinct list of names (distinct function)
2. for each name have the earliest date (min function)
3. id that goes with the above two columns.
4. if the date is the same, use min(id) as a way to decide which row to select

so the desired results in this case would look like this..

_____________________________________________________ 
| id (PK) | name  |    date    | ... other data ... |
-----------------------------------------------------
|    3    |  CARL | 2015-09-02 |      ....
--------------------------------
|    4    |  BOB  | 2016-11-18 |      ....
--------------------------------
|    5    |  JON  | 2016-03-03 |      ....
--------------------------------
|    6    |  TIM  | 2016-11-24 |      ....
--------------------------------

Is there a way to do this efficiently in a query without having to create a temporary table and loop over it using script?

This gets me half of the way there:
SELECT distinct(name), min(date), ... other data ... from table1


when I apply the min function to id though it doesn't behave in the way that I need. In the case of BOB it mixes the two rows and returns 1 for 'id' and '2016-11-18' for the date - which is completely incorrect

Any help is much appreciated.

Upvotes: 0

Views: 455

Answers (2)

Philip Kelley
Philip Kelley

Reputation: 40309

SQL Server 2005 and up support windowing functions, and row_number can be singularly useful. This or something very similar should work.

SELECT
   name
  ,date
  ,id
 from (--  Can't reference the windowing function within it's own select,
       --  so we make it a subquery
       select
          name
         ,date
         ,id
         ,row_number() over (partition by name order by date, id) Ranking
        from Table1
      ) sub
 where Ranking = 1
 order by name

Upvotes: 1

Matt
Matt

Reputation: 14341

;WITH cte AS (
    SELECT *
       ,ROW_NUMBER() OVER (PARTITION BY name ORDER BY Date, Id) as RowNum
    FROM
       @Table
)

SELECT *
FROM
    cte
WHERE
    RowNum = 1

So if you think of it slightly different. You want the earliest Date row per name. So if you create a partitioned ROW_NUMBER() on name ordered by Date Ascending and then select where ROW_NUMBER() = 1 you will have your answer. There are ways to do this with the aggregation as well. You can also add id in the order by if you want the first id if the Date is the same as is the case with Tim.

Upvotes: 0

Related Questions