Teddy Miller
Teddy Miller

Reputation: 11

SQL - Avoid duplicates when self-joining a table

I am currently trying to join a table to itself to determine differences in dates. The entries list attributes for different units that have a distinct ID. Every time the entry is edited, it adds a new entry with the new attributes, instead of rewriting it. I am trying to find all the entries where the start or end date are different. I am able to find all entries that the dates change, but I get duplicates displayed as such:

ID StartDate1 EndDate1 StartDate2 EndDate2
1 1/12/2018 1/15/2018 1/13/2018 1/16/2018
1 1/13/2018 1/16/2018 1/12/2018 1/15/2018

my code is the following:
SELECT DISTINCT u1.ID, u1.FirstNight, u1.LastNight, u2.FirstNight, u2.LastNight FROM units u1 LEFT JOIN units u2 ON u2.ID = u1.ID WHERE ((u1.firstnight <> u2.firstnight) AND (u1.lastnight <> u2.lastnight)

What might I do to remove the duplicate entries?
Thanks!

Upvotes: 1

Views: 3471

Answers (2)

Yulio Aleman Jimenez
Yulio Aleman Jimenez

Reputation: 1677

There is no a simple solution for what you want to do. You must play a little with the values of the fields to achieve the desired result.

Try this SQL code

Schema

create table dates (id int, first date, last date);

insert into dates (id, first, last) VALUES (1, '2018/03/11', '2018/5/11');
insert into dates (id, first, last) VALUES (1, '2018/08/11', '2018/10/11');
insert into dates (id, first, last) VALUES (1, '2018/03/25', '2018/09/16');
insert into dates (id, first, last) VALUES (1, '2018/02/25', '2018/04/16');

insert into dates (id, first, last) VALUES (2, '2018/05/10', '2018/08/09');
insert into dates (id, first, last) VALUES (2, '2018/09/15', '2018/07/18');

Query

SELECT id1, first1, last1, id2, first2, last2 FROM 
(

    SELECT distinct 

    d1.id id1, 
    d1.first first1, 
    d1.last last1, 
    d2.id id2, 
    d2.first first2, 
    d2.last last2,
    ROW_NUMBER() OVER(PARTITION BY (d1.first + d2.first)) rn

    FROM dates d1
    JOIN dates d2 ON d2.id = d1.id

    WHERE (d1.first <> d2.first) AND (d1.last <> d2.last)

) AS DataResult

WHERE rn = 1;

How it works

  • The subquery select the desired fields,
  • but the last field use an Aggregate Window Functions with the ROW_NUMBER() function.
  • This last, count all element partitioned by the Sum of the First Dates in self joined table

  • The outside query select the desired fields less the Row Number field, besides use the Row Number calculated to restrict to 1 in the Where clause.

Here are a SQL Fiddle to test a bit more!

Upvotes: 1

Bohemian
Bohemian

Reputation: 425228

I think your query meant to have

u2.ID != u1.ID

To stop each row joining to itself.

Change that condition to:

u2.ID < u1.ID

to also return only 1 version of each pair.

Upvotes: 2

Related Questions