Reputation: 251

How to remove unique rows and KEEP duplicates? SQL

I am trying to run a custom sql query to select only the duplicate rows within the dataset, excluding the unique results.

I am trying to display only the duplicated rows for each project, i.e. if for a project (car) the user added a product (window) twice. While the window might be contained in many other projects (house, building, etc...) the condition is duplication within a project, not the entire data set.

is it possible to write a query for this?

UPDATE: duplicates

As you can see on the image below, projects (id) 500 and 505 contain duplicate parts in them (window and hood), so what i would like to display is just the results of projects (ids) that contain duplicates. So the final view would not have projects 501, 502, 503, 504, 506. And the ones that are duplicated would only display the rows that contain the duplicated field, i.e. project 500 (id) would only display rows 100 (scope) and 108, and delete 101, 102 and 114 which are unique. As seen below, part OC01 (highlighted in red) can also show on other projects(id),but project 503 should not be displayed because OC01 is not duplicated within that id.

The data is contained in an excel file, which is then manipulated by a BI software called Tableau which lets me write custom sql to extract the data from the xls in different ways.

Upvotes: 1

Answers (3)

Gordon Linoff

Reputation: 1271003

You can do this easily with window/analytic functions:

select t.*
from (select t.*, count(*) over (partition by project, product) as cnt
      from table t
     ) t
where cnt > 1;

These functions are supported in most databases. If they are not available, you can do the logic with either a correlated subquery or by joining in a group by query.

EDIT:

I don't know if Tableau supports the above query. You can try this:

select t.*
from table t1
where 1 < (select count(*)
           from table t2
           where t2.project = t.project and t2.product = t.product
          );

Perhaps this version will work:

select t.*
from table t join
     (select project, product, count(*) as cnt
      from table t
      group by project, product
     ) pp
     on t.project = pp.project and t.product = pp.product and cnt > 1;

Upvotes: 3

dnoeth

Reputation: 60502

If your DBMS doesn't support the Windowed Aggregate Functions used by Gordon Linoff's solution you can do:

select t1.*
from table t1
join
  (select project, product
   from table
   group by project, product
   having count(*) > 1
  ) t2
on t1.project = t2.project
and t1.product = t2.Product;

Edit based on updated query:

SELECT *
FROM tab t1
WHERE EXISTS
 (
   SELECT * FROM tab t2
   WHERE t1.id = t2.id
     AND t1.part = t2.part
     AND t1.scope <> t2.scope
 );

Upvotes: 1

hot_whisky

Reputation: 1102

SELECT * FROM TABLE1 
WHERE project 
IN (SELECT X.project FROM (
    SELECT project, product,COUNT(product)
    FROM TABLE1
    GROUP BY project,product
    HAVING COUNT(product) > 1)X)
AND product IN (SELECT X.product FROM (
    SELECT project, product,COUNT(product)
    FROM TABLE1
    GROUP BY project,product
    HAVING COUNT(product) > 1)X)

Upvotes: 1

How to remove unique rows and KEEP duplicates? SQL

Answers (3)

Related Questions