Ray Y
Ray Y

Reputation: 1311

Deleting duplicate entries from postgres database

I'm having a little trouble trying to delete multiple entries from my database which are very similar but not exactly the same. I have looked here and many other solutions:

Delete Duplicate Records in PostgreSQL

How to delete duplicate rows with SQL?

Everytime I try to delete multiple duplicates from the table, The command removes all the entries instead of just removing the duplicates.

Here is the table with a duplicate sample, where we should only keep one unique hdrtime:

SELECT * from stdtextproducts where xxxid='DEN' AND nnnid='MTR' and hdrtime='270600';
 cccid |  datacrc   | hdrtime | nnnid | site | wmoid  | xxxid | bbbid |       inserttime        |                                 product                                  |    reftime    
-------+------------+---------+-------+------+--------+-------+-------+-------------------------+--------------------------------------------------------------------------+---------------
 DEN   | 3680361181 | 270600  | MTR   | KDEN | SAUS70 | DEN   | RRF   | 2018-08-27 05:55:51.811 | SAUS70 KDEN 270600 RRF                                                  +| 1535349351811
       |            |         |       |      |        |       |       |                         | METAR KDEN 270553Z 22017KT 10SM BKN150 OVC200 23/06 A2991 RMK AO2 PK WND+| 
       |            |         |       |      |        |       |       |                         |      22026/0456 SLP028 T02330056 10289 20222 58004                       | 
 DEN   | 1538417601 | 270600  | MTR   | KDEN | SAUS70 | DEN   | RRM   | 2018-08-27 05:57:57.356 | SAUS70 KDEN 270600 RRM                                                  +| 1535349477356
       |            |         |       |      |        |       |       |                         | METAR KDEN 270553Z 22017KT 10SM BKN150 OVC200 23/06 A2991 RMK AO2 PK WND+| 
       |            |         |       |      |        |       |       |                         |      22026/0456 SLP028 T02330056 10289 20222 58004                       | 
(2 rows)

I have tried the following:

DELETE FROM stdtextproducts a USING (SELECT MIN(ctid) as ctid, hdrtime FROM stdtextproducts GROUP BY hdrtime HAVING COUNT(*) > 1) b WHERE a.hdrtime = b.hdrtime AND a.ctid <> b.ctid;

And the following:

DELETE FROM stdtextproducts WHERE reftime NOT IN (SELECT MAX(reftime) FROM stdtextproducts GROUP BY hdrtime);

I should be expecting only one entry to show up on the list but it seems none of the entries are there anymore.

SELECT * from stdtextproducts where xxxid='DEN' AND nnnid='MTR' and hdrtime='270600';
 cccid | datacrc | hdrtime | nnnid | site | wmoid | xxxid | bbbid | inserttime | product | reftime 
-------+---------+---------+-------+------+-------+-------+-------+------------+---------+---------
(0 rows)

What am I missing here?

Thanks in advance.

Upvotes: 0

Views: 168

Answers (2)

Joe Love
Joe Love

Reputation: 5932

DELETE FROM stdtextproducts a where
   exists 
    (SELECT *
    FROM stdtextproducts b
    where a.hdrtime=b.hdrtime 
    and b.ctid<a.ctid
    ) 

If this does not work, are you sure there isn't another KEY here besides just hrdtime?

Upvotes: 0

Zaynul Abadin Tuhin
Zaynul Abadin Tuhin

Reputation: 32003

try like below

  DELETE FROM stdtextproducts a where
   a.ctid<>   
    (SELECT MIN(b.ctid) as ctid
    FROM stdtextproducts b
    where a.hdrtime=b.hdrtime  
    ) 

or

   DELETE FROM stdtextproducts T1 
   USING   stdtextproducts  T2
    WHERE   T1.ctid < T2.ctid  -- delete the older versions
    AND T1.hdrtime= T2.hdrtime ;  -- add more columns if needed

Upvotes: 1

Related Questions