Reputation: 251
I am trying to run a custom sql query to select only the duplicate rows within the dataset, excluding the unique results.
I am trying to display only the duplicated rows for each project, i.e. if for a project (car) the user added a product (window) twice. While the window might be contained in many other projects (house, building, etc...) the condition is duplication within a project, not the entire data set.
is it possible to write a query for this?
UPDATE:
As you can see on the image below, projects (id) 500 and 505 contain duplicate parts in them (window and hood), so what i would like to display is just the results of projects (ids) that contain duplicates. So the final view would not have projects 501, 502, 503, 504, 506. And the ones that are duplicated would only display the rows that contain the duplicated field, i.e. project 500 (id) would only display rows 100 (scope) and 108, and delete 101, 102 and 114 which are unique. As seen below, part OC01 (highlighted in red) can also show on other projects(id),but project 503 should not be displayed because OC01 is not duplicated within that id.
The data is contained in an excel file, which is then manipulated by a BI software called Tableau which lets me write custom sql to extract the data from the xls in different ways.
Upvotes: 1
Views: 6399
Reputation: 1271003
You can do this easily with window/analytic functions:
select t.*
from (select t.*, count(*) over (partition by project, product) as cnt
from table t
) t
where cnt > 1;
These functions are supported in most databases. If they are not available, you can do the logic with either a correlated subquery or by joining in a group by
query.
EDIT:
I don't know if Tableau supports the above query. You can try this:
select t.*
from table t1
where 1 < (select count(*)
from table t2
where t2.project = t.project and t2.product = t.product
);
Perhaps this version will work:
select t.*
from table t join
(select project, product, count(*) as cnt
from table t
group by project, product
) pp
on t.project = pp.project and t.product = pp.product and cnt > 1;
Upvotes: 3
Reputation: 60502
If your DBMS doesn't support the Windowed Aggregate Functions used by Gordon Linoff's solution you can do:
select t1.*
from table t1
join
(select project, product
from table
group by project, product
having count(*) > 1
) t2
on t1.project = t2.project
and t1.product = t2.Product;
Edit based on updated query:
SELECT *
FROM tab t1
WHERE EXISTS
(
SELECT * FROM tab t2
WHERE t1.id = t2.id
AND t1.part = t2.part
AND t1.scope <> t2.scope
);
Upvotes: 1
Reputation: 1102
SELECT * FROM TABLE1
WHERE project
IN (SELECT X.project FROM (
SELECT project, product,COUNT(product)
FROM TABLE1
GROUP BY project,product
HAVING COUNT(product) > 1)X)
AND product IN (SELECT X.product FROM (
SELECT project, product,COUNT(product)
FROM TABLE1
GROUP BY project,product
HAVING COUNT(product) > 1)X)
Upvotes: 1