user338804
user338804

Reputation: 61

Filter Duplicate Rows on Conditions

I would like to filter duplicate rows on conditions so that the rows with minimum modified and maximum active and unique rid and did are picked. self join? or any better approach that would be performance wise better?

Example:

id        rid                  modified                 active         did
1             1             2010-09-07 11:37:44.850              1             1
2             1             2010-09-07 11:38:44.000              1             1
3             1             2010-09-07 11:39:44.000              1             1
4             1             2010-09-07 11:40:44.000              0             1
5             2             2010-09-07 11:41:44.000              1             1
6             1             2010-09-07 11:42:44.000              1             2

Output expected is

1             1             2010-09-07 11:37:44.850              1             1
5             2             2010-09-07 11:41:44.000              1             1
6             1             2010-09-07 11:42:44.000              1             2

Commenting on the first answer, the suggestion does not work for the below dataset(when active=0 and modified is the minimum for that row)

 id        rid                     modified                      active           did
    1             1             2010-09-07 11:37:44.850              1             1
    2             1             2010-09-07 11:38:44.000              1             1
    3             1             2010-09-07 11:39:44.000              1             1
    4             1             2010-09-07 11:36:44.000              0             1
    5             2             2010-09-07 11:41:44.000              1             1
    6             1             2010-09-07 11:42:44.000              1             2

Upvotes: 4

Views: 281

Answers (3)

ErikE
ErikE

Reputation: 50271

You can get good performance with a CROSS APPLY if you have a table that has one row for each combination of rid and did:

SELECT
   X.*
FROM
   ParentTable P
   CROSS APPLY (
      SELECT TOP 1 *
      FROM YourTable T
      WHERE P.rid = T.rid AND P.did = T.did
      ORDER BY active DESC, modified
   ) X

Substituting (SELECT DISTINCT rid, did FROM YourTable) for ParentTable would work but will hurt performance.

Also, here is my crazy, single scan magic query which can often outperform other methods:

SELECT
   id = Substring(Packed, 6, 4),
   rid,
   modified = Convert(datetime, Substring(Packed, 2, 4)),
   Active = Convert(bit, 1 - Substring(Packed, 1, 1)),
   did,
FROM
   (
      SELECT
         rid,
         did,
         Packed = Min(Convert(binary(1), 1 - active) + Convert(binary(4), modified) + Convert(binary(4), id)
      FROM
         YourTable
      GROUP BY
         rid,
         did
   ) X

This method is not recommended because it's not easy to understand, and it's very easy to make mistakes with it. But it's a fun oddity because it can outperform other methods in some cases.

Upvotes: 0

Martin Smith
Martin Smith

Reputation: 453898

Assuming SQL Server 2005+. Use RANK() instead of ROW_NUMBER() if you want ties returned.

;WITH YourTable as
(
SELECT 1 id,1 rid,cast('2010-09-07 11:37:44.850' as datetime) modified, 1 active,1 did union all
SELECT 2,1,'2010-09-07 11:38:44.000', 1,1 union all
SELECT 3,1,'2010-09-07 11:39:44.000', 1,1 union all
SELECT 4,1,'2010-09-07 11:36:44.000', 0,1 union all
SELECT 5,2,'2010-09-07 11:41:44.000', 1,1 union all
SELECT 6,1,'2010-09-07 11:42:44.000', 1,2
),cte as
(
SELECT id,rid,modified,active, did,
ROW_NUMBER() OVER (PARTITION BY rid,did ORDER BY active DESC, modified ASC ) RN
FROM YourTable
)
SELECT id,rid,modified,active, did
FROM cte
WHERE rn=1
order by id

Upvotes: 2

Jeremy
Jeremy

Reputation: 122

select id, rid, min(modified), max(active), did from foo group by rid, did order by id;

Upvotes: 0

Related Questions