aherrick
aherrick

Reputation: 20169

SQL Server eliminate a row with "duplicate" data

I have data in the following format:

DATE                 DATA1                 DATA2
-------------------------------------------------
20121010             ABC                   DEF
20121010             DEF                   ABC
20121010             HIJ                   KLM
20121010             KLM                   HIJ
20121212             ABC                   DEF
20121212             DEF                   ABC
20121212             HIJ                   KLM
20121212             KLM                   HIJ

What I want to do is select rows 1 and 3. I don't care about rows 2 and 4 because they are essentially "duplicates" in my eyes.

Seems simple but I'm just trying to put the query together to accomplish this.

Upvotes: 2

Views: 158

Answers (5)

ypercubeᵀᴹ
ypercubeᵀᴹ

Reputation: 115530

SELECT Date,  Data1, Data2
FROM tableX
WHERE Data1 <= Data2

UNION 

SELECT Date,  Data2, Data1
FROM tableX
WHERE Data2 < Data1 ;

Upvotes: 1

Oleksandr Fedorenko
Oleksandr Fedorenko

Reputation: 16904

 ;WITH cte AS
 (
  SELECT DATE, DATA1, DATA2, ROW_NUMBER() OVER(ORDER BY DATE, DATA1) AS Id
  FROM dbo.test70 t
  )
  SELECT c.DATE, c.DATA1, c.DATA2
  FROM cte c LEFT JOIN cte ct ON c.Id = ct.Id + 1 AND c.DATA1 = ct.DATA2 
  WHERE ct.DATE IS NULL

Demo on SQLFiddle

Upvotes: 0

rs.
rs.

Reputation: 27427

You can try this:

SQL DEMO HERE

;WITH CTE AS
(
  SELECT date, data1, data2, 
  CASE WHEN DATA2 > DATA1 THEN data1 + data2 
  ELSE data2 + data1 END d
  FROM T
),CTE2 AS 
(
  SELECT *,
  ROW_NUMBER() OVER (PARTITION BY DATE, d 
                     ORDER BY data1) rn
  FROM CTE
)
SELECT date, data1, data2 FROM CTE2 WHERE rn = 1

Upvotes: 0

whytheq
whytheq

Reputation: 35557

I like it like this: but I believe it won't work if you're using pre SQL-Server 2008R2

;WITH myCTE
        AS
        (
        SELECT 
            [Rw] = ROW_NUMBER() OVER (ORDER BY [Date]),
            [Date],
            [DATA1],
            [DATA2]
        FROM #data
        )
SELECT *
FROM myCTE x    
WHERE NOT EXISTS
            (
            SELECT 1
            FROM myCTE y
            WHERE
                x.DATA1 = y.DATA2 AND
                x.Rw > y.Rw AND
                x.Date = y.Date
            )

Here's a live illustration on SQL Fiddle

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269773

You can use the row_number() function for this, assuming you are using version 2005 or higher:

select date, data1, data2
from (select t.*,
             row_number() over (partition by date order by date) as seqnum
      from t
     ) t
where seqnum = 1

The expression order by date should produce an arbitrary ordering in any database that supports row_number. In SQL Server, you can also use order by (select NULL).

Or, I realize that your question may be about eliminate duplicates, regardless of order. For that, you can do:

select distinct date, minData, maxData
from (select t.date,
             (case when data1 > data2 then data1 else data2 end) as minData,
             (case when data1 > data2 then data2 else data1 end) as maxData
      from t
     ) t

This might, however, rearrange the two values, when only one row appears.

The more complicated solution to maintain the original ordering of the columns and eliminate the additional rows combines the two approaches:

select date, data1, data2
from (select t.*,
             row_number() over (partition by date order by minData, maxData) as seqnum
      from (select t.*
                   (case when data1 > data2 then data1 else data2 end) as minData,
                   (case when data1 > data2 then data2 else data1 end) as maxData
            from t
           ) t
     ) t
where seqnum = 1

Upvotes: 1

Related Questions