MySQL - Finding Duplicated Data from Two Columns

Question

I have an arbitrarily large MySQL table where there are duplicated rows, however to determine which rows are duplicated I need to match the data from two columns. A modified snippet of the table is below.

mysql> select * from DATA_STATUS where METADATA_ID='6ac00785-abcd-3f4a-defg-12b8ed23abff';
+--------+------------+--------------------------------------+-------------+
| ID     | STATUS     |  METADATA_ID                         | METADATA_FK |
+--------+------------+--------------------------------------+-------------+
| 1      |          3 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+
| 2      |          3 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+
| 3      |          0 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+
| 4      |          0 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+
| 5      |          1 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+
| 6      |          2 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+

I want to do a select on the entire table where there are multiple of the same METADATA_ID where the duplicated METADATA_ID rows have a STATUS of 3. I know how to query a table for duplicates in one column, but am struggling to figure out how to match on duplicates and other conditions.

From the example data, the row IDs that match this condition are 1 and 2 but not 3.

EDIT: Additional information for clarification and TL;DR conditions

The overall criteria for a duplicate is STATUS=3 and METADATA_ID > 1, the snippet below shows the rows that meet this.

+--------+------------+--------------------------------------+-------------+
| ID     | STATUS     |  METADATA_ID                         | METADATA_FK |
+--------+------------+--------------------------------------+-------------+
| 1      |          3 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+
| 2      |          3 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+

I want the query to either pull back just one row that contains the ID, STATUS and METADATA_ID (METADATA_FK is optional) when a duplicate is found, or all instances of the duplication, either is fine. The data is not counted as a duplicate if STATUS is not 3 or the METADATA_ID only exists once in the table.

Mattia Nocerino · Accepted Answer

Try this:

select * 
from yourtable
where 
  status_id = 3 and 
  metadata_id in (
        select metadata_id 
        from yourtable
        where status_id = 3 
        group by metadata_id 
        having count(*) > 1
  );

Working example

If not all rows are necessary you can use this simple query:

select * from yourtable where status_id = 3 group by metadata_id having count(*) > 1;

MySQL - Finding Duplicated Data from Two Columns

Answers (2)

Related Questions