TDo
TDo

Reputation: 744

Eliminating rows with duplicate value

So I have a table like this:

Number      | Number 1     | Date     | Date 1
------------+--------------+----------+----------
1...........| 1000.........| 5........| 10........
2...........| 1000.........| 7........| 10........
1...........| 1001.........| 5........| 11........
2...........| 1001.........| 7........| 11........
8...........| 2000.........| 18.......| 20........
9...........| 2000.........| 19.......| 20........

I want to remove duplicate values in both column Number and Number 1 while keeping the difference between Date and Date 1 minimum. For example, for the 2 rows where Number 1 is 2000, I delete the row where Number is 8 and keep the row where Number is 9 since 19 is closer to 20 than 18.

But I am stuck when trying to remove duplicate values in the first 4 rows of the table. If I use the above method, I would delete all rows where Number is 1 since the difference between Date and Date 1 in these rows is larger than that where Number is 2. This pattern occurs many times in my table so I cannot manually delete duplicate values.

Upvotes: 1

Views: 48

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269773

The following gets information about the rows you want to keep:

select t.*,
       (select t2.date1
        from table t2
        where t.number1 = t2.number1
        order by abs(t2.date1 - t2.date)
        limit 1
       ) as date1
from table t;

You can incorporate this into a delete using join:

delete t
    from table t left join
         (select t.*,
                 (select t2.date1
                  from table t2
                  where t.number1 = t2.number1
                  order by abs(t2.date1 - t2.date)
                  limit 1
                 ) as date1
          from table t
         ) tt
         on tt.number1 = t.number1 and tt.date1 = t.date
    where tt.number1 is null;

Upvotes: 1

Related Questions