Californium
Californium

Reputation: 591

Select all rows where two columns have the same value?

I'm working with SQL and was wondering how I would get all of the rows where values in 2 columns are equal. For example, imagine this table:

+----+---------+
| ID | Version |
+----+---------+
| AB |       1 |
| AB |       1 |
| BA |       2 |
| BA |       2 |
| CB |       1 |
+----+---------+

I want to select all rows where the IDs and versions match other rows with the same values in their ID and Version columns. In other words, I want to find duplicate values. So the desired output would be:

+----+---------+
| ID | Version |
+----+---------+
| AB |       1 |
| AB |       1 |
| BA |       2 |
| BA |       2 |
+----+---------+

How would I go about doing this as efficiently as possible in a table with over a million rows?

Upvotes: 1

Views: 3014

Answers (2)

astentx
astentx

Reputation: 6751

If you need a duplicates count per group, then use GROUP BY ... HAVING. If you need an overall count of duplicated rows, then use another aggregation over the group by.

For Oracle (fiddle) it can be done in single step:

with a as (
  select 1 as id, 'A' as v from dual union all
  select 1 as id, 'A' as v from dual union all
  select 1 as id, 'B' as v from dual union all
  select 1 as id, 'B' as v from dual union all
  select 2 as id, 'C' as v from dual
)
select  sum(count(1)) as total_duplicates
  , count(count(1)) as duplicate_groups
from a
group by id, v
having count(1) > 1

For SQL Server (fiddle), for example, it does not work, so I've added another select over it

with a as (
  select 1 as id, 'A' as v  union all
  select 1 as id, 'A' as v  union all
  select 1 as id, 'B' as v  union all
  select 1 as id, 'B' as v  union all
  select 2 as id, 'C' as v 
)
select sum(cnt) as total_duplicates
  , count(cnt) as duplicate_groups
from (
  select count(1) as cnt
  from a
  group by id, v
  having count(1) > 1
) as q
| total_duplicates | duplicate_groups |
+-------------------------------------+
| 4                | 2                | 

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269453

The simplest method are probably window functions:

select t.*
from (select t.*,
             count(*) over (partition by id, version) as cnt
      from t
     ) t
where cnt >= 2;

If you have an index on (id, version) (or (version, id)), then the database engine should be able to take advantage of that.

Upvotes: 3

Related Questions