vijay shanker
vijay shanker

Reputation: 2663

finding distinct pairs in sql

I was trying to learn non-equi joins when I encountered this problem. I have a table pops:

+------------+--------------+------+-----+---------+-------+
| Field      | Type         | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| country    | varchar(100) | YES  |     | NULL    |       |
| continent  | varchar(100) | YES  |     | NULL    |       |
| population | bigint(20)   | YES  |     | NULL    |       |

I was trying to find countries with their population in the vicinity of say, 100.

   select distinct 
       p1.country, 
       p2.country, 
       p1.population, 
       p2.population 
   from pops p1 
       inner join pops p2 
           on p1.population between p2.population - 100 and p2.population + 100 
           and p1.country <> p2.country  
   where p2.country <> p1.country

output I got was:

+------------+------------+------------+------------+
| country    | country    | population | population |
+------------+------------+------------+------------+
| pakistan   | india      |      99988 |      99999 |
| china      | india      |      99990 |      99999 |
| bangladesh | japan      |        999 |        999 |
| india      | pakistan   |      99999 |      99988 |
| china      | pakistan   |      99990 |      99988 |
| japan      | bangladesh |        999 |        999 |
| india      | china      |      99999 |      99990 |
| pakistan   | china      |      99988 |      99990 |
+------------+------------+------------+------------+

as we can see, I am getting pairs of (india, pakistan) as well as (pakistan, india), which is data-wise the same thing. Is it possible to eliminate one of the records from the pair?

Upvotes: 1

Views: 141

Answers (2)

Li Mengran
Li Mengran

Reputation: 367

just change the join condition of p1.country <> p2.country to p1.country < p2.country

Upvotes: 3

Mureinik
Mureinik

Reputation: 311448

You could decide to always have the lexographically first (or last, for argument's sake) country on the p1 side - use < (or >) instead of <>. Also, note that your where clause is redundant, since you already have this condition in the on clause of the join:

select     p1.country, 
           p2.country, 
           p1.population, 
           p2.population 
from       pops p1 
inner join pops p2 
           on p1.population between p2.population - 100 and p2.population + 100 and
               p1.country < p2.country  
-- Here ------------------^

Upvotes: 7

Related Questions