user7792598
user7792598

Reputation: 197

Oracle SQL How to find duplicate values in different columns?

I have a set of rows with many columns. For example,

ID | Col1 | Col2 | Col3 | Duplicate
------------------------------------
81 | 101  | 102  | 101  | YES
82 | 101  | 103  | 104  | NO

I need to calculate the "Duplicate" column. It is duplicate because it has the same value in Col1 and Col3. I know there is the LEAST function, which is similar to the MIN function but with columns. Does something similar to achieve this exists?

The approach I have in mind is to write all possible combinations in a case like this:

SELECT ID, col1, col2, col3, 
       CASE WHEN col1 = col2 or col1 = col3 or col2 = col3 then 1 else 0 end as Duplicate
FROM table

But, I wish to avoid that, since I have too many columns in some cases, and is very prone to errors.

What is the best way to solve this?

Upvotes: 0

Views: 2480

Answers (6)

Dheeraj kumar
Dheeraj kumar

Reputation: 1

I think you want to use fresh data that doesnot contains any duplicate values inside table if it right then use SELECT DISTINCT statement like

SELECT DISTINCT * FROM TABLE_NAME

It will conatins duplicate free data,
Note: It will also applicable for a particular column like

SELECT DISTINCT col1 FROM TABLE_NAME

Upvotes: 0

fg78nc
fg78nc

Reputation: 5232

SELECT ID, col1, col2, 
    NVL2(NULLIF(col1, col2), 'Not duplicate', 'Duplicate')
       FROM table;

If you want to compare more than 2 columns can implement same logic with COALESCE

Upvotes: 0

Bill Karwin
Bill Karwin

Reputation: 562931

The best way is to avoid storing repeating groups of columns. If you have multiple columns that essentially store comparable data (i.e. a multi-valued attribute), move the data to a dependent table, and use one column.

CREATE TABLE child (
 ref_id INT,
 col INT
);

INSERT INTO child VALUES
(81, 101), (81, 102), (81, 101),
(82, 101), (82, 103), (82, 104);

Then it's easier to find cases where a value occurs more than once:

SELECT id, col, COUNT(*)
FROM child
GROUP BY id, col
HAVING COUNT(*) > 1;

If you can't change the structure of the table, you could simulate it using UNIONs:

SELECT id, col1, COUNT(*)
FROM (
    SELECT id, col1 AS col FROM mytable
    UNION ALL SELECT id, col2 FROM mytable
    UNION ALL SELECT id, col3 FROM mytable
    ... for more columns ...
) t
GROUP BY id, col
HAVING COUNT(*) > 1;

Best for the query you are trying to run. A denormalized storage strategy might be better for some other types of queries.

Upvotes: 0

user5683823
user5683823

Reputation:

Something like this... note that in the lateral clause we still need to unpivot, but that is one row at a time - resulting in possibly much faster execution than simple unpivot and standard aggregation.

with
     input_data ( id, col1, col2, col3 ) as (
       select 81, 101, 102, 101 from dual union all
       select 82, 101, 103, 104 from dual
     )
-- End of simulated input data (for testing purposes only).
-- Solution (SQL query) begins BELOW THIS LINE.
select i.id, i.col1, i.col2, i.col3, l.duplicates
from   input_data i,
         lateral ( select  case when count (distinct val) = count(val) 
                                then 'NO' else 'YES'
                           end  as duplicates
                   from    input_data
                   unpivot ( val for col in ( col1, col2, col3 ) )
                   where   id = i.id
                 ) l
;

ID  COL1  COL2  COL3  DUPLICATES
--  ----  ----  ----  ----------
81   101   102   101  YES
82   101   103   104  NO 

Upvotes: 1

Vamsi Prabhala
Vamsi Prabhala

Reputation: 49270

You can do this by unpivoting and then counting the distinct values per id and checking if it equals the number of rows for that id. Equal means there are no duplicates. Then left join this result to the original table to caclulate the duplicate column.

SELECT t.*,
       CASE WHEN x.id IS NOT NULL THEN 'Yes' ELSE 'No' END AS duplicate
FROM t
LEFT JOIN
  (SELECT id
   FROM
     (SELECT *
      FROM t 
      unpivot (val FOR col IN (col1,col2,col3)) u 
     ) t
   GROUP BY id
   HAVING count(*)<>count(DISTINCT val)
  ) x ON x.id=t.id

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1271121

Hmmm. You are looking for within-row duplicates. This is painful. More recent versions of Oracle support lateral joins. But for just a handful of non-NULL columns, you can do:

select id, col1, col2, col3,
       (case when col1 in (col2, col3) or col2 in (col3) then 1 else 0 end) as Duplicate
from t;

For each additional column, you need to add one more in comparison and update the other in-lists.

Upvotes: 2

Related Questions