ralphje
ralphje

Reputation: 995

Selecting column pairs where one of the columns occurs in at least two distinct pairs

I wish to select a distinct pair of values from a MySQL table, where the value of one column occurs at least twice.

For example, I have the following table:

+----+---------+----------+-----------+
| id | machine | filename | ipaddress |
+----+---------+----------+-----------+
| 1  | X       | /tmp/1   | 1.2.3.4   |
| 2  | X       | /tmp/2   | 1.2.3.4   |
| 3  | X       | /tmp/3   | 2.3.4.5   |
| 4  | Y       | /tmp/4   | 2.3.4.5   |
| 5  | Z       | /tmp/5   | 2.3.4.5   |
| 6  | Z       | /tmp/6   | 2.3.4.5   |
+----+---------+----------+-----------+

I wish to select all rows where the specific ipaddress is present in at least two machines. Only one machine-ipaddress-pair is required, so the result would be:

+---------+-----------+
| machine | ipaddress |
+---------+-----------+
| X       | 2.3.4.5   |
| Y       | 2.3.4.5   |
| Z       | 2.3.4.5   |
+---------+-----------+

I have tried grouping on machine and ipaddress:

SELECT machine, ipaddress FROM table 
GROUP BY machine, ipaddress HAVING count(*) > 1

However, this resulted in (X, 1.2.3.4) to be included too.

I also attempted to use a subquery, but this only returned any ipaddress only once:

SELECT machine, ipaddress FROM 
    (SELECT machine, ipaddress FROM table GROUP BY machine, ipaddress) t 
GROUP BY ipaddress HAVING count(*) > 1

I have trouble finding a proper query. It should be noted that the table is rather large and an efficient query is somewhat required.

Upvotes: 0

Views: 86

Answers (2)

dennislloydjr
dennislloydjr

Reputation: 960

I don't have MySQL in front of me at the moment, but I tried the following in SQL Server and it seems to work for your problem. Be sure to write some unit tests to validate:

SELECT DISTINCT machine, ipaddress
FROM [table] t1
WHERE EXISTS (
 SELECT 1
   FROM [table] t2
  WHERE t1.ipaddress = t2.ipaddress
  GROUP BY ipaddress
 HAVING COUNT(DISTINCT machine) > 1);

This produces for your data:

+---------+-----------+
| machine | ipaddress |
+---------+-----------+
| X       | 2.3.4.5   |
| Y       | 2.3.4.5   |
| Z       | 2.3.4.5   |
+---------+-----------+

This alternative syntax also works and may (or may not) be faster for MySQL (do a performance test if that is important for you):

SELECT DISTINCT machine, ipaddress
FROM [table]
WHERE ipaddress IN (
 SELECT ipaddress
   FROM [table]
  GROUP BY ipaddress
 HAVING COUNT(DISTINCT machine) > 1);

Upvotes: 1

Kuzgun
Kuzgun

Reputation: 4737

Try this one:

  SELECT machine, ipaddress FROM 
        (SELECT DISTINCT machine, ipaddress FROM table) t 
    GROUP BY ipaddress,machine HAVING count(*) > 1

Upvotes: 0

Related Questions