Reputation: 2112
I have a Postgres table like so:
|scanID|scandatetime |eventcode|state|
------------------------------------------
|12345 |2020-07-28 1:00 |123 |WA |
|12345 |2020-07-28 2:00 |156 |WA |
|12345 |2020-07-29 10:00 |200 |OR |
|34678 |2020-07-20 4:00 |123 |TX |
|34678 |2020-07-20 8:00 |156 |AR |
|34678 |2020-07-22 1:00 |200 |MS |
and essentially I want to delete every row such that I only keep 2 rows per scan ID. I want to retain the scan IDs with the minimum time and with the maximum time.
The current workflow is that data gets aggregated and written to this table every day, so after it's written there may be a bunch of new scan events, but I only want to keep the max and min. How would I go about doing that?
EDIT: The desired result table would look like this
|scanID|scandatetime |eventcode|state|
------------------------------------------
|12345 |2020-07-28 1:00 |123 |WA |
|12345 |2020-07-29 10:00 |200 |OR |
|34678 |2020-07-20 4:00 |123 |TX |
|34678 |2020-07-22 1:00 |200 |MS |
Upvotes: 0
Views: 1473
Reputation: 1270653
You can use using
:
delete from t
using (select scanId, min(scandatetime) as min_sdt, max(scandatetime) as max_sdt
from t
group by scanid
) tt
where tt.scanId = t.scanId and t.scandatetime not in (tt.min_sdt, tt.max_sdt);
You could also phrase this as:
delete from t
where scandatetime <> (select min(t2.scandatetime) from t tt where tt.scanid = t.scanid) and
scandatetime <> (select max(t2.scandatetime) from t tt where tt.scanid = t.scanid) ;
Upvotes: 2
Reputation: 44250
A record is in the middle if there is (at least) one above it, and (at least) one below it:
DELETE FROM ztable d
WHERE EXISTS ( SELECT *
FROM ztable x
WHERE x.scanId = d.scanId
AND x.scandatetime < d.scandatetime
)
AND EXISTS ( SELECT *
FROM ztable x
WHERE x.scanId = d.scanId
AND x.scandatetime > d.scandatetime
);
Similar trick, using row_number()
:
DELETE FROM ztable d
USING ( SELECT scanId, scandatetime
, row_number() OVER
(PARTITION BY scanId ORDER BY scandatetime ASC) rn
, row_number() OVER
(PARTITION BY scanId ORDER BY scandatetime DESC) rrn
FROM ztable
) x
WHERE x.scanId = d.scanId
AND x.scandatetime = d.scandatetime
AND x.rn <> 1 AND x.rrn <> 1
;
Upvotes: 1
Reputation:
You can use a NOT IN with a sub-select:
delete from the_table t1
where (scanid, scandatetime) not in (select scanid, min(scandatetime)
from the_table
group by scanid
union all
select scanid, max(scandatetime)
from the_table
group by scanid);
But I assume the solution using exists
will be faster.
Upvotes: 0