SQL Finding duplicate values in two of the three columns of each row

Question

Let's say we have three columns: A, B, and C.

I would like to filter the results as follows:

The values of A and B are the same (duplicated) for > 1 (more than 1) row, and the value of C is always different.

In the attached image, the values that appear selected would meet the conditions mentioned above.

What I've tried:

SELECT 
  a.notation as A, a.gene as B, b.id as C
FROM 
  `db-dummy`.sgdata c 
  join `db-dummy`.g_info a on a.rec_id = c.gen_id 
  join `db-dummy`.spec_data b on b.rec_id = c.spec_id GROUP BY A, B HAVING COUNT(*) > 1;

I thought that using GROUP BY and HAVING COUNT(*) > 1 I could get the desired result, but I get the following error:

SQL Error [1055] [42000]: (conn=1632) Expression #3 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'db-dummy.b.spec_id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

Gordon Linoff · Accepted Answer

If you had a single table, I would suggest just using exists. But because you have a join, use window functions. If you are. looking for different values of id:

SELECT A, B, C
FROM (SELECT a.notation as A, a.gene as B, b.id as C,
             MIN(b.id) OVER (PARTITION BY a.notation, a.gene) as min_id,
             MAX(b.id) OVER (PARTITION BY a.notation, a.gene) as max_id
      FROM `db-dummy`.sgdata c JOIN
           `db-dummy`.g_info a 
           ON a.rec_id = c.gen_id JOIN
           `db-dummy`.spec_data b 
           ON b.rec_id = c.spec_id
     ) x
WHERE min_id <> max_id;

If you are just looking for multiple rows for a given A and B, then you can use:

SELECT A, B, C
FROM (SELECT a.notation as A, a.gene as B, b.id as C,
             COUNT(*) OVER (PARTITION BY a.noation, a.gene) as cnt
      FROM `db-dummy`.sgdata c JOIN
           `db-dummy`.g_info a 
           ON a.rec_id = c.gen_id JOIN
           `db-dummy`.spec_data b 
           ON b.rec_id = c.spec_id
     ) x
WHERE cnt > 1;

SQL Finding duplicate values in two of the three columns of each row

Answers (2)

Related Questions