Reputation: 11

SQL - Avoid duplicates when self-joining a table

I am currently trying to join a table to itself to determine differences in dates. The entries list attributes for different units that have a distinct ID. Every time the entry is edited, it adds a new entry with the new attributes, instead of rewriting it. I am trying to find all the entries where the start or end date are different. I am able to find all entries that the dates change, but I get duplicates displayed as such:

ID StartDate1 EndDate1 StartDate2 EndDate2
1 1/12/2018 1/15/2018 1/13/2018 1/16/2018
1 1/13/2018 1/16/2018 1/12/2018 1/15/2018

my code is the following:
SELECT DISTINCT u1.ID, u1.FirstNight, u1.LastNight, u2.FirstNight, u2.LastNight FROM units u1 LEFT JOIN units u2 ON u2.ID = u1.ID WHERE ((u1.firstnight <> u2.firstnight) AND (u1.lastnight <> u2.lastnight)

What might I do to remove the duplicate entries?
Thanks!

Upvotes: 1

Answers (2)

Yulio Aleman Jimenez

Reputation: 1677

There is no a simple solution for what you want to do. You must play a little with the values of the fields to achieve the desired result.

Try this SQL code

Schema

create table dates (id int, first date, last date);

insert into dates (id, first, last) VALUES (1, '2018/03/11', '2018/5/11');
insert into dates (id, first, last) VALUES (1, '2018/08/11', '2018/10/11');
insert into dates (id, first, last) VALUES (1, '2018/03/25', '2018/09/16');
insert into dates (id, first, last) VALUES (1, '2018/02/25', '2018/04/16');

insert into dates (id, first, last) VALUES (2, '2018/05/10', '2018/08/09');
insert into dates (id, first, last) VALUES (2, '2018/09/15', '2018/07/18');

Query

SELECT id1, first1, last1, id2, first2, last2 FROM 
(

    SELECT distinct 

    d1.id id1, 
    d1.first first1, 
    d1.last last1, 
    d2.id id2, 
    d2.first first2, 
    d2.last last2,
    ROW_NUMBER() OVER(PARTITION BY (d1.first + d2.first)) rn

    FROM dates d1
    JOIN dates d2 ON d2.id = d1.id

    WHERE (d1.first <> d2.first) AND (d1.last <> d2.last)

) AS DataResult

WHERE rn = 1;

How it works

The subquery select the desired fields,
but the last field use an Aggregate Window Functions with the ROW_NUMBER() function.
This last, count all element partitioned by the Sum of the First Dates in self joined table
The outside query select the desired fields less the Row Number field, besides use the Row Number calculated to restrict to 1 in the Where clause.

Here are a SQL Fiddle to test a bit more!

Upvotes: 1

Bohemian

Reputation: 425228

I think your query meant to have

u2.ID != u1.ID

To stop each row joining to itself.

Change that condition to:

u2.ID < u1.ID

to also return only 1 version of each pair.

Upvotes: 2

SQL - Avoid duplicates when self-joining a table

Answers (2)

Schema

Query

How it works

Related Questions