Reputation: 107
I have the following table in MSSQL2005
id | business_key | result
1 | 1 | 0
2 | 1 | 1
3 | 2 | 1
4 | 3 | 1
5 | 4 | 1
6 | 4 | 0
And now i want to group based on the business_key returning the complete entry with the highest id. So my expected result is:
business_key | result
1 | 1
2 | 1
3 | 1
4 | 0
I bet that there is a way to achieve that, i just can't see it at the moment.
Upvotes: 8
Views: 19407
Reputation: 21
This is an older post but was relevant to something I was doing currently (2013). If you get a larger dataset (typical in most DBs), the performance of the various queries (looking at execution plans) says a lot. First we create a "TALLY table" to randomly generate numbers, then use an arbitrary formula to create data for the "MyTable":
CREATE TABLE #myTable(
[id] [int] NOT NULL,
[business_key] [int] NOT NULL,
[result] [int] NOT NULL,
PRIMARY KEY (Id)
) ON [PRIMARY];
; WITH
-- Tally table Gen Tally Rows: X2 X3
t1 AS (SELECT 1 N UNION ALL SELECT 1 N), -- 4 , 8
t2 AS (SELECT 1 N FROM t1 x, t1 y), -- 16 , 64
t3 AS (SELECT 1 N FROM t2 x, t2 y), -- 256 , 4096
t4 AS (SELECT 1 N FROM t3 x, t3 y), -- 65536 , 16,777,216
t5 AS (SELECT 1 N FROM t4 x, t4 y), -- 4,294,967,296, A lot
Tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) N
FROM t5 x, t5 y)
INSERT INTO #MyTable
SELECT N, CAST(N/RAND(N/8) AS bigINT)/5 , N%2
FROM Tally
WHERE N < 500000
Next we run three different types of queries to review the performance (turn on "Actual Execution Plan" if you are using SQL Server Management Studio):
SET STATISTICS IO ON
SET STATISTICS TIME ON
----- Try #1
select 'T1' AS Qry, id, business_key,
result
from #myTable
where id in
(select max(id)
from #myTable
group by business_key)
---- Try #2
select 'T2' AS Qry, id, business_key,
result
from
(select id,
business_key,
result,
max(id) over (partition by business_key) as max_id
from #mytable) x
where id = max_id
---- Try #3
;with cteRowNumber as (
select id,
business_key,
result,
row_number() over(partition by business_key order by id desc) as RowNum
from #mytable
)
SELECT 'T3' AS Qry, id, business_key,
result
FROM cteRowNumber
WHERE RowNum = 1
Cleanup:
IF OBJECT_ID(N'TempDB..#myTable',N'U') IS NOT NULL
DROP TABLE #myTable;
SET STATISTICS IO OFF
SET STATISTICS TIME OFF
You will find, looking at the execution plans, "Try 1" has the best "Query Cost" and lowest CPU time but "Try 3" has the least reads and CPU time is not too bad. I would recommend using a CTE method for the least reads
Upvotes: 2
Reputation: 8000
select
drv.business_key,
mytable.result
from mytable
inner join
(
select
business_key,
max(id) as max_id
from mytable
group by
business_key
) as drv on
mytable.id = drv.max_id
Upvotes: 5
Reputation:
select business_key,
result
from
(select id,
business_key,
result,
max(id) over (partition by business_key) as max_id
from mytable) x
where id = max_id
Upvotes: 3
Reputation: 13031
Try this
select business_key,
result
from myTable
where id in
(select max(id)
from myTable
group by business_key)
EDIT: I created the table to test my code. I include it below in case anybody else wants to test it.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[myTable](
[id] [int] NOT NULL,
[business_key] [int] NOT NULL,
[result] [int] NOT NULL
) ON [PRIMARY]
go
insert into myTable values(1,1,0);
insert into myTable values(2,1,1);
insert into myTable values(3,2,1);
insert into myTable values(4,3,1);
insert into myTable values(5,4,1);
insert into myTable values(6,4,0);
select * from mytable
Upvotes: 3
Reputation: 47402
An alternative solution, which may give you better performance (test both ways and check the execution plans):
SELECT
T1.id,
T1.business_key,
T1.result
FROM
dbo.My_Table T1
LEFT OUTER JOIN dbo.My_Table T2 ON
T2.business_key = T1.business_key AND
T2.id > T1.id
WHERE
T2.id IS NULL
This query assumes that the ID is a unique value (at least for any given business_key) and that it is set to NOT NULL.
Upvotes: 15