david mah
david mah

Reputation: 315

SQL Query where the result has duplicate rows

So I have to write a query in SQL where I list the celebrities that have been in relationships with the same celeb. I basically list out celeb1, celeb2, and celeb3 where celeb3 has been in a relationship with both celeb1 and celeb2. Here's the query I'm using:

SELECT S1.Celeb1, S2.Celeb2, S3.name AS Celeb3
FROM Relationships S1, Relationships S2, Celebs S3
WHERE S3.name = S1.Celeb2 
AND S3.name = S2.Celeb1 
AND S1.Celeb1 <> S2.Celeb2;

It's difficult to know if this query is correct since it gives me 200 rows in the result but I looked at a few of the row and it looks like it's giving me the correct result where celeb3 has been in a relationship with both celeb1 and 2. The problem is that there are duplicate rows in the result. This could be due to the fact that in the relationships table, it lists out the celeb1, celeb2 relationship but it also lists the inverse celeb2, celeb1 as well. So how can I prevent the result from listing out duplicates?

Here are the two tables I'm using to do this (Relationships and Celebs).

CREATE TABLE Celebs(
  name VARCHAR(30)
);

CREATE TABLE Relationships (
  Celeb1 VARCHAR(30), 
  Celeb2 VARCHAR(30)
);

Upvotes: 0

Views: 177

Answers (3)

MT0
MT0

Reputation: 167877

Oracle Setup:

CREATE TABLE celebs ( name ) AS
  SELECT 'A' FROM DUAL UNION ALL
  SELECT 'B' FROM DUAL UNION ALL
  SELECT 'C' FROM DUAL UNION ALL
  SELECT 'D' FROM DUAL;

CREATE TABLE relationships ( celeb1, celeb2 ) AS
  SELECT 'A', 'B' FROM DUAL UNION ALL
  SELECT 'B', 'C' FROM DUAL UNION ALL
  SELECT 'C', 'D' FROM DUAL;

Query:

SELECT DISTINCT
       c.name,
       CASE c.name WHEN r.celeb1 THEN r.celeb2 ELSE r.celeb1 END AS has_relationship_with
FROM   celebs c
       LEFT OUTER JOIN
       relationships r
       ON ( c.name = r.celeb1 OR c.name = r.celeb2 );

Output:

NAME HAS_RELATIONSHIP_WITH
---- ---------------------
A    B
B    A
B    C
C    B
C    D
D    C

If you want A,B and do not want the inverse B,A then change the ON clause for the join to:

ON (  ( c.name = r.celeb1 AND c.name < r.celeb2 )
   OR ( c.name = r.celeb2 AND c.name < r.celeb1 ) )

Query 2:

You could then group this using LISTAGG to just get one row per person:

SELECT name,
       LISTAGG( rel, ',' ) WITHIN GROUP ( ORDER BY rel ) AS has_relationship_with
FROM   (
  SELECT DISTINCT
         c.name,
         CASE c.name WHEN r.celeb1 THEN r.celeb2 ELSE r.celeb1 END AS rel
  FROM   celebs c
         LEFT OUTER JOIN
         relationships r
         ON ( c.name = r.celeb1 OR c.name = r.celeb2 )
)
GROUP BY name;

Output:

NAME HAS_RELATIONSHIP_WITH
---- ---------------------
A    B
B    A,C
C    B,D
D    C

Upvotes: 0

Thorsten Kettner
Thorsten Kettner

Reputation: 94884

Let's look at a sample:

celeb1  celeb2
A       B
B       C
C       D

Expected result:

  • A and C were both with B.
  • B and D were both with C.

In order to find these matches I suggest to duplicate tuples such that each couple is twice in the table (if that is not already the case).

celeb1  celeb2
A       B
B       A
B       C
C       B
C       D
D       C

We can already see that B and C each had two partners. Join this data set with itself so as to connect the records.

with rel as
(
  select celeb1 as cel1, celeb2 as cel2 from relationships 
  union 
  select celeb2 as cel1, celeb1 as cel2 from relationships 
)
select rel1.cel2 as celeb1, rel2.cel2 as celeb2, rel1.cel1 as partner
from rel rel1
join rel rel2 on rel2.cel1 = rel1.cel1 and rel2.cel2 > rel1.cel2
order by 1, 2, 3;

Upvotes: 1

Erich Kitzmueller
Erich Kitzmueller

Reputation: 36977

If Celeb3 was in a relationship with A and B, you will also get B, A as a result. To avoid that, just make a constraint that A > B:

SELECT DISTINCT S1.Celeb1, S2.Celeb2, S3.name AS Celeb3
FROM Relationships S1, Relationships S2, Celebs S3
WHERE S3.name = S1.Celeb2 
AND S3.name = S2.Celeb1 
AND S1.Celeb1 > S2.Celeb2

Upvotes: 0

Related Questions