Sai Kiran
Sai Kiran

Reputation: 1

How to Identify unique and duplicate records using sql statement

I have a table which has multiple values in a particular column, I am trying to write two sql query which will identify the entire set of duplicates and entire set of unique records, I wrote few queries but in the unique set I am getting one of the record from the duplicate set.

sample data,

pay_id, pay_ratio, pay_type, cor_id
123,    12,        C,        Annual
123,    12,        C,        Annual
456,    13,        A,        Semi-Annual
476,    43,        B,        Monthly
987,    32,        H,        Daily
987,    32,        H,        Daily

I am trying to separate the above data set as below.

Unique Data Set

pay_id, pay_ratio, pay_type, cor_id
456,    13,        A,        Semi-Annual
476,    43,        B,        Monthly

Duplicate Data set

pay_id, pay_ratio, pay_type, cor_id
123,    12,        C,        Annual
123,    12,        C,        Annual
987,    32,        H,        Daily
987,    32,        H,        Daily

can someone suggest me how can I achieve this using sql query.

Regards, Sai

Upvotes: 0

Views: 289

Answers (3)

Edward
Edward

Reputation: 362

Group and count:

SELECT
  pay_id, pay_ratio, pay_type, cor_id,
  COUNT(*) AS [Duplicates]
FROM table
GROUP BY pay_id, pay_ratio, pay_type, cor_id
HAVING COUNT(*) = 0 /* unique; > 0 - duplicates */
ORDER BY COUNT(*) ASC

Upvotes: 0

SqlZim
SqlZim

Reputation: 38023

using a common table expression with row_number()

Unique:

;with cte as (
  select *
    , rn = row_number() over (partition by pay_id order by pay_type)
  from t
)
select *
from cte
where not exists (
  select 1
  from cte i
  where i.pay_id = cte.pay_id
    and i.rn > 1
  )

rextester demo: http://rextester.com/LHIGTY81886

returns:

+--------+-----------+----------+-------------+----+
| pay_id | pay_ratio | pay_type |   cor_id    | rn |
+--------+-----------+----------+-------------+----+
|    456 |        13 | A        | Semi-Annual |  1 |
|    476 |        43 | B        | Monthly     |  1 |
+--------+-----------+----------+-------------+----+

Duplicates:

;with cte as (
  select *
    , rn = row_number() over (partition by pay_id order by pay_type)
  from t
)
select *
from cte
where exists (
  select 1
  from cte i
  where i.pay_id = cte.pay_id
    and i.rn > 1
  )

returns:

+--------+-----------+----------+--------+----+
| pay_id | pay_ratio | pay_type | cor_id | rn |
+--------+-----------+----------+--------+----+
|    123 |        12 | C        | Annual |  1 |
|    123 |        12 | C        | Annual |  2 |
|    987 |        32 | H        | Daily  |  1 |
|    987 |        32 | H        | Daily  |  2 |
+--------+-----------+----------+--------+----+

Upvotes: 0

Giorgos Betsos
Giorgos Betsos

Reputation: 72175

You can do this using COUNT() OVER():

SELECT pay_id, pay_ratio, pay_type, cor_id,
       CASE 
          WHEN COUNT(*) OVER (PARTITION BY pay_id, pay_ratio, pay_type, cor_id) = 1 
             THEN 'unique'
          ELSE 'dupl'
       END AS type
FROM mytable

The above query returns 'unique' for unique records and 'dupl' for duplicates. You can wrap the query in a CTE or subquery and filter it as you like.

Note: The above query is based on the assumption that all 4 fields of the table determine a duplicate record. You may change the PARTITION BY clause as you like to address some other duplicate 'logic'.

Upvotes: 3

Related Questions