Alex Burr
Alex Burr

Reputation: 51

Selecting duplicate rows with distinct column values

I have a view with rows which may contain the same person multiple times, but with possibly different values in different columns. For example:

ColA   ColB    ColC      Name
------ ------- --------- -------------
497222 5471580 099228813 John Travolta
497222 5615311 099228813 John Travolta
497222 5385303 099228813 John Travolta
497222 5385303 081224467 Tracy Morgan
503373 5471580 033582700 Christine Taylor
503373 5241410 033582700 Christine Taylor
805836 2211223 066491720 Richard Dreyfuss
961922 4030548 066491720 Richard Dreyfuss
503373 5241410 881236698 Erika Eleniak
121214 7723841 072546815 Tilda Swinton
168463 9998994 072546815 Tilda Swinton

In my results I want to return only rows that have multiple occurrences of the ColC value, so I would never want Tracy Morgan or Erika Eleniak in my results.

What I'm looking for is two separate scenarios:

  1. Return the three results John Travolta and two Christine Taylor, based on them each having multiples with the same value for ColA but different values for ColB.
  2. Return the two results Richard Dreyfuss and two Tilda Swinton, based on them each having different values for ColA.

Thanks to a colleague, I think I have a good query for scenario 1:

SELECT ColA, ColB, ColC, Name
FROM MyView
WHERE ColC IN
(
  SELECT ColC
  FROM
  (
    SELECT ColC, COUNT (*)
    FROM MyView
    GROUP BY ColC
    HAVING COUNT (ColC) > 1
  )
);

However I am completely at a loss for handling scenario 2 so any help is appreciated!

Upvotes: 0

Views: 1653

Answers (5)

Del
Del

Reputation: 1599

I think the following query will satisfy all three of your requirements:

WITH test_data (col_a, col_b, col_c, name) AS
(
  SELECT 497222, 5471580, 099228813, 'John Travolta' FROM DUAL UNION ALL
  SELECT 497222, 5615311, 099228813, 'John Travolta' FROM DUAL UNION ALL
  SELECT 497222, 5385303, 099228813, 'John Travolta' FROM DUAL UNION ALL
  SELECT 497222, 5385303, 081224467, 'Tracy Morgan' FROM DUAL UNION ALL
  SELECT 503373, 5471580, 033582700, 'Christine Taylor' FROM DUAL UNION ALL
  SELECT 503373, 5241410, 033582700, 'Christine Taylor' FROM DUAL UNION ALL
  SELECT 805836, 2211223, 066491720, 'Richard Dreyfuss' FROM DUAL UNION ALL
  SELECT 961922, 4030548, 066491720, 'Richard Dreyfuss' FROM DUAL UNION ALL
  SELECT 503373, 5241410, 881236698, 'Erika Eleniak' FROM DUAL UNION ALL
  SELECT 121214, 7723841, 072546815, 'Tilda Swinton' FROM DUAL UNION ALL
  SELECT 168463, 9998994, 072546815, 'Tilda Swinton' FROM DUAL
)
SELECT s.col_a, s.col_b, s.col_c, s.name
FROM (SELECT td.*, 
             COUNT(*) OVER (PARTITION BY td.name, col_c) AS COL_C_COUNT,
             COUNT(*) OVER (PARTITION BY td.name, col_a) AS COL_A_COUNT,
             COUNT(*) OVER (PARTITION BY td.name, col_a, col_b) AS COL_B_COUNT
      FROM test_data td) s
WHERE s.col_c_count > 1 --First check to see if there are multiple values for column C
AND (COL_A_COUNT = 1 --Checks for Different Values in Col A (Dreyfuss/Swinton Case)
     OR COL_B_COUNT = 1); --Checks for Uniqueness in Col B if COL_A_COUNT > 1 (Based on short-circuit)

As mention in the comments, the analytical functions in the inner query get you the information necessary to be able make your determinations of who to return in the outer query.

Upvotes: 1

Thorsten Kettner
Thorsten Kettner

Reputation: 95090

You want to return a row when the person has at least one other row with the same colc value and either a different cola or a different colb. Or so I understand this request. You can use an EXISTS clause for that.

select *
from mytable
where exists
(
  select null
  from mytable other
  where other.rowid <> mytable.rowid
  and other.name = mytable.name
  and other.colc = mytable.colc
  and (other.cola <> mytable.cola or other.colb <> mytable.colb)
)
order by name, cola, colb, colc;

Upvotes: 1

MT0
MT0

Reputation: 168671

You can do your first query with only a single table scan:

SELECT cola,
       colb,
       colc,
       name
FROM   (
  SELECT t.*,
         COUNT(DISTINCT colb) OVER (PARTITION BY colc, cola) AS num_b
  FROM   myview t
)
WHERE  num_b > 1;

Which, for the sample data:

CREATE TABLE myview (cola, colb, colc, name ) AS
SELECT 497222, 5471580, 99228813, 'John Travolta' FROM DUAL UNION ALL
SELECT 497222, 5615311, 99228813, 'John Travolta' FROM DUAL UNION ALL
SELECT 497222, 5385303, 99228813, 'John Travolta' FROM DUAL UNION ALL
SELECT 497222, 5385303, 81224467, 'Tracy Morgan' FROM DUAL UNION ALL
SELECT 503373, 5471580, 33582700, 'Christine Taylor' FROM DUAL UNION ALL
SELECT 503373, 5241410, 33582700, 'Christine Taylor' FROM DUAL UNION ALL
SELECT 805836, 2211223, 66491720, 'Richard Dreyfuss' FROM DUAL UNION ALL
SELECT 961922, 4030548, 66491720, 'Richard Dreyfuss' FROM DUAL UNION ALL
SELECT 503373, 5241410, 881236698, 'Erika Eleniak' FROM DUAL UNION ALL
SELECT 121214, 7723841, 72546815, 'Tilda Swinton' FROM DUAL UNION ALL
SELECT 168463, 9998994, 72546815, 'Tilda Swinton' FROM DUAL

Outputs:

COLA COLB COLC NAME
503373 5241410 33582700 Christine Taylor
503373 5471580 33582700 Christine Taylor
497222 5385303 99228813 John Travolta
497222 5471580 99228813 John Travolta
497222 5615311 99228813 John Travolta

For your second query, you can use the same technique:

SELECT cola,
       colb,
       colc,
       name
FROM   (
  SELECT t.*,
         COUNT(DISTINCT cola) OVER (PARTITION BY colc) AS num_a
  FROM   myview t
)
WHERE  num_a > 1;

Which outputs:

COLA COLB COLC NAME
805836 2211223 66491720 Richard Dreyfuss
961922 4030548 66491720 Richard Dreyfuss
121214 7723841 72546815 Tilda Swinton
168463 9998994 72546815 Tilda Swinton

db<>fiddle here

Upvotes: 1

Sunil Kumar
Sunil Kumar

Reputation: 91

Please check this, Enforced all the conditions necessary for your scenarios that you mentioned.  

WITH MYVIEW AS 
(SELECT 497222  as colA,    5471580 as colB,    99228813    as colC,'John Travolta' AS NAME FROM DUAL UNION ALL
SELECT 497222   as colA,    5615311 as colB,    99228813    as colC,'John Travolta' AS NAME FROM DUAL UNION ALL
SELECT 497222   as colA,    5385303 as colB,    99228813    as colC,'John Travolta' AS NAME FROM DUAL UNION ALL
SELECT 497222   as colA,    5385303 as colB,    81224467    as colC,'Tracy Morgan' AS NAME FROM DUAL UNION ALL
SELECT 503373   as colA,    5471580 as colB,    33582700    as colC,'Christine Taylor' AS NAME FROM DUAL UNION ALL
SELECT 503373   as colA,    5241410 as colB,    33582700    as colC,'Christine Taylor' AS NAME FROM DUAL UNION ALL
SELECT 805836   as colA,    2211223 as colB,    66491720    as colC,'Richard Dreyfuss' AS NAME FROM DUAL UNION ALL
SELECT 961922   as colA,    4030548 as colB,    66491720    as colC,'Richard Dreyfuss' AS NAME FROM DUAL UNION ALL
SELECT 503373   as colA,    5241410 as colB,    881236698   as colC,'Erika Eleniak' AS NAME FROM DUAL UNION ALL
SELECT 121214   as colA,    7723841 as colB,    72546815    as colC,'Tilda Swinton' AS NAME FROM DUAL UNION ALL
SELECT 168463   as colA,    9998994 as colB,    72546815    as colC,'Tilda Swinton' AS NAME FROM DUAL)

--1ST Requirement
SELECT 'Req-1' as Req, cola, colb, colc, name FROM MYVIEW WHERE COLC IN (
SELECT COLC
FROM MYVIEW
GROUP BY COLC
HAVING COUNT(COLA)>1 AND COUNT(DISTINCT COLA)=1 AND COUNT(DISTINCT COLB)>1 AND COUNT(COLC)>1
)
--2nd Requirement
union all
SELECT 'Req-2' as Req, cola, colb, colc, name FROM MYVIEW WHERE COLC IN (
SELECT COLC
FROM MYVIEW
GROUP BY COLC
HAVING COUNT(DISTINCT COLA)>1 AND COUNT(COLC)>1
)
;

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270993

You can use exists:

select v.*
from myview v
where exists (select 1
              from myview v2
              where v2.name = v.name and
                    v2.colc <> v.colc
             );

Upvotes: 1

Related Questions