Reputation: 107

SQL Server: Only last entry in GROUP BY

I have the following table in MSSQL2005

id | business_key | result
1 | 1 | 0
2 | 1 | 1
3 | 2 | 1
4 | 3 | 1
5 | 4 | 1
6 | 4 | 0

And now i want to group based on the business_key returning the complete entry with the highest id. So my expected result is:

business_key | result
1 | 1
2 | 1
3 | 1
4 | 0

I bet that there is a way to achieve that, i just can't see it at the moment.

Upvotes: 8

Answers (5)

DBNewbie_2007

Reputation: 21

This is an older post but was relevant to something I was doing currently (2013). If you get a larger dataset (typical in most DBs), the performance of the various queries (looking at execution plans) says a lot. First we create a "TALLY table" to randomly generate numbers, then use an arbitrary formula to create data for the "MyTable":

CREATE TABLE #myTable(
    [id] [int] NOT NULL,
    [business_key] [int] NOT NULL,
    [result] [int] NOT NULL,
    PRIMARY KEY (Id)
) ON [PRIMARY];

; WITH
    -- Tally table Gen            Tally Rows:     X2                X3
t1 AS (SELECT 1 N UNION ALL SELECT 1 N),    -- 4            ,    8
t2 AS (SELECT 1 N FROM t1 x, t1 y),            -- 16            ,    64
t3 AS (SELECT 1 N FROM t2 x, t2 y),            -- 256            ,    4096
t4 AS (SELECT 1 N FROM t3 x, t3 y),            -- 65536        ,    16,777,216
t5 AS (SELECT 1 N FROM t4 x, t4 y),            -- 4,294,967,296,    A lot
Tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) N
          FROM t5 x, t5 y)

INSERT INTO #MyTable 
SELECT N, CAST(N/RAND(N/8) AS bigINT)/5 , N%2
FROM Tally
WHERE N < 500000

Next we run three different types of queries to review the performance (turn on "Actual Execution Plan" if you are using SQL Server Management Studio):

SET STATISTICS IO ON
SET STATISTICS TIME ON
----- Try #1 
select  'T1' AS Qry, id, business_key, 
        result
from    #myTable
where   id in 
        (select max(id)
        from    #myTable
        group by business_key)

---- Try #2 
select 'T2' AS Qry, id, business_key, 
       result
    from 
    (select id, 
        business_key, 
        result, 
        max(id) over (partition by business_key) as max_id
    from #mytable) x
where id = max_id

---- Try #3 
;with cteRowNumber as (
    select id, 
        business_key, 
        result,
           row_number() over(partition by business_key order by id desc) as RowNum
        from #mytable
)

SELECT 'T3' AS Qry, id, business_key, 
       result
FROM cteRowNumber
WHERE RowNum = 1

Cleanup:

IF OBJECT_ID(N'TempDB..#myTable',N'U') IS NOT NULL 
    DROP TABLE #myTable;
    SET STATISTICS IO OFF
SET STATISTICS TIME OFF

You will find, looking at the execution plans, "Try 1" has the best "Query Cost" and lowest CPU time but "Try 3" has the least reads and CPU time is not too bad. I would recommend using a CTE method for the least reads

Upvotes: 2

Chris Simpson

Reputation: 8000

select
  drv.business_key,
  mytable.result
from mytable
  inner join
  (
    select 
      business_key, 
      max(id) as max_id
    from mytable
    group by
      business_key
  ) as drv on
    mytable.id = drv.max_id

Upvotes: 5

Dave Curley

Reputation:

select business_key, 
       result
    from 
    (select id, 
        business_key, 
        result, 
        max(id) over (partition by business_key) as max_id
    from mytable) x
where id = max_id

Upvotes: 3

John MacIntyre

Reputation: 13031

Try this

select  business_key, 
        result
from    myTable
where   id in 
        (select max(id)
        from    myTable
        group by business_key)

EDIT: I created the table to test my code. I include it below in case anybody else wants to test it.

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[myTable](
    [id] [int] NOT NULL,
    [business_key] [int] NOT NULL,
    [result] [int] NOT NULL
) ON [PRIMARY]
go

insert into myTable values(1,1,0);
insert into myTable values(2,1,1);
insert into myTable values(3,2,1);
insert into myTable values(4,3,1);
insert into myTable values(5,4,1);
insert into myTable values(6,4,0);

select  * from mytable

Upvotes: 3

Tom H

Reputation: 47402

An alternative solution, which may give you better performance (test both ways and check the execution plans):

SELECT
     T1.id,
     T1.business_key,
     T1.result
FROM
     dbo.My_Table T1
LEFT OUTER JOIN dbo.My_Table T2 ON
     T2.business_key = T1.business_key AND
     T2.id > T1.id
WHERE
     T2.id IS NULL

This query assumes that the ID is a unique value (at least for any given business_key) and that it is set to NOT NULL.

Upvotes: 15

SQL Server: Only last entry in GROUP BY

Answers (5)

Related Questions