Reputation: 2112

Postgres Delete records from table keeping a minimum and maximum value

I have a Postgres table like so:

|scanID|scandatetime     |eventcode|state|
------------------------------------------
|12345 |2020-07-28 1:00  |123      |WA   |
|12345 |2020-07-28 2:00  |156      |WA   |
|12345 |2020-07-29 10:00 |200      |OR   |
|34678 |2020-07-20 4:00  |123      |TX   |
|34678 |2020-07-20 8:00  |156      |AR   |
|34678 |2020-07-22 1:00  |200      |MS   |

and essentially I want to delete every row such that I only keep 2 rows per scan ID. I want to retain the scan IDs with the minimum time and with the maximum time.

The current workflow is that data gets aggregated and written to this table every day, so after it's written there may be a bunch of new scan events, but I only want to keep the max and min. How would I go about doing that?

EDIT: The desired result table would look like this

|scanID|scandatetime     |eventcode|state|
------------------------------------------
|12345 |2020-07-28 1:00  |123      |WA   |
|12345 |2020-07-29 10:00 |200      |OR   |
|34678 |2020-07-20 4:00  |123      |TX   |
|34678 |2020-07-22 1:00  |200      |MS   |

Upvotes: 0

Answers (3)

Gordon Linoff

Reputation: 1270653

You can use using:

delete from t
    using (select scanId, min(scandatetime) as min_sdt, max(scandatetime) as max_sdt
           from t
           group by scanid
          ) tt
    where tt.scanId = t.scanId and t.scandatetime not in (tt.min_sdt, tt.max_sdt);

You could also phrase this as:

delete from t
    where scandatetime <> (select min(t2.scandatetime) from t tt where tt.scanid = t.scanid) and
          scandatetime <> (select max(t2.scandatetime) from t tt where tt.scanid = t.scanid) ;

Upvotes: 2

wildplasser

Reputation: 44250

A record is in the middle if there is (at least) one above it, and (at least) one below it:

DELETE FROM ztable d
WHERE EXISTS ( SELECT *         
        FROM  ztable x
        WHERE x.scanId = d.scanId
        AND x.scandatetime < d.scandatetime
        )
AND EXISTS ( SELECT *
        FROM  ztable x
        WHERE x.scanId = d.scanId
        AND x.scandatetime > d.scandatetime
        );

Similar trick, using row_number():

DELETE FROM ztable d
USING ( SELECT scanId, scandatetime
        , row_number() OVER 
                (PARTITION BY scanId ORDER BY scandatetime ASC) rn
        , row_number() OVER
                (PARTITION BY scanId ORDER BY scandatetime DESC) rrn
        FROM  ztable 
        ) x
        WHERE x.scanId = d.scanId
        AND x.scandatetime = d.scandatetime
        AND x.rn <> 1 AND x.rrn <> 1

        ;

Upvotes: 1

user330315

Reputation:

You can use a NOT IN with a sub-select:

delete from the_table t1
where (scanid, scandatetime) not in (select scanid, min(scandatetime)
                                     from the_table
                                     group by scanid
                                     union all
                                     select scanid, max(scandatetime)
                                     from the_table
                                     group by scanid);

But I assume the solution using exists will be faster.

Upvotes: 0

Postgres Delete records from table keeping a minimum and maximum value

Answers (3)

Related Questions