wannaBeDev
wannaBeDev

Reputation: 739

SQL query to remove incorrect entries from a database with date column as decision criterion

Due to incorrect insertions I have several wrong entries in a database. These occur in groups of four. Examples:

ID_A  Startdate     Enddate      ID_B
296   05.09.2013    28.10.2017   62
296   05.09.2013    01.01.2999   62
296   28.10.2017    05.09.2013   65
296   28.10.2017    01.01.2999   65

285   26.02.2016    07.09.2018   93
285   26.02.2016    01.01.2999   93
285   07.09.2018    26.02.2016   58
285   07.09.2018    01.01.2999   58

There can only be one valid entry for a given period. The second and third entries do not belong in the table and must be removed. The end date of the first entry must be the same as the start date of the next. I can easily identify the third entry because the end date is smaller than the start date.

Delete from table where ID_A = 296 and Enddate < Startdate

But for the second line I have no idea yet. My desired result is to make two valid entries from the upper four lines:

ID_A  Startdate     Enddate      ID_B
296   05.09.2013    28.10.2017   62
296   28.10.2017    01.01.2999   65

285   26.02.2016    07.09.2018   93
285   07.09.2018    01.01.2999   58

Anybody got any ideas on how I should go about this?

Upvotes: 0

Views: 729

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1270463

I think you can do:

delete from t
    where not exists (select 1
                      from t t2
                      where t2.startdate = t.enddate
                     ) and
          exists (select 1
                  from t t2
                  where t2.startdate < t.startdate or
                        (t2.startdate = t.startdate and t2.enddate < t.enddate)
                 );

Here is a db<>fiddle.

This works for your sample data. It is checking that there is a "matching" record before a current record -- and that the current record is not the first. It would not work for all incarnations. In fact, you could have some complex interleaved records where a single solution is not obvious.

Upvotes: 1

Ankit Bajpai
Ankit Bajpai

Reputation: 13517

You may try below -

Delete from table t1
where exists (select 1
              from table t2
              where t1.ID_A = t2.ID_A
              and t1.Startdate = t2.Startdate
              and t1.ID_B = t2.ID_B
              and t1.Enddate > t2.Enddate)

Upvotes: 1

Related Questions