Reputation: 151
I've got a really big collection of data in a postgres database where I'd like to nuke data past a particular age... but I do not want it nuking the latest iteration of data from any given location & site combination.
Basically, I've got a really big table that has a location (bigint)
, site (bigint)
, readdate (bigint)
, and a little accompanying data (note: there will be multiple entries for a given site, location, and readdate - but anything on the same readdate is considered part of the same scan, and needs to be kept for a given location).
Currently, I've just got it set to get rid of all old records... but the possibility exists that a particular site and location combination will stop giving out data for a while, and I'd like to preserve the final state if that happens. I'm doing the SQL queries from php, so I'm pretty sure I could hack together some highly ugly code that finds the latest readdate for any given site & location combination, then either deletes stuff younger than that for that location, or deletes based on the calender limit (whichever gives the lesser date), but I'd prefer to put the decision-making workload in the SQL query, rather than having to first get a list of all location, site, and max(readdate) entries, then iterate over them in php making individual delete queries.
My current query (which doesn't do what I want, as it deletes everything before $limit) is declared by:
$query="DELETE FROM votwdata WHERE readdate < '".$limit."';";
Upvotes: 0
Views: 95
Reputation: 389
If I understand what you are trying to do, you have a number of fields that might be the same, and you want to keep the most recent record. Assuming you have a sequential ID or a created_at on each record, you can run a subquery to identify the records you want to delete. For example:
select max(id),data1,data2 from table group by data1,data2;
That will pull the most recent record for a unique data1 and data2. You can run that as an inline query, joining it back to the original table.
select t.* from table t, (select max(id) "id",data1,data2 from table group by data1,data2) t2 where t.id=t2.id;
That will give you the most recent records. You can do an left join and look at the null values to delete anything that you don't like.
select t.id,t2.id from table t left join (select max(id) "id",data1,data2 from table group by 2,3) t2 on t.id=t2.id where t2.id is null;
That gives you all the records that you want to delete.
Okay, that's the dirty way - refactor away.
Upvotes: 1