Reputation: 61
I am new to Postgres and have a task to delete records from a live table that was created unpartitioned. Now my head is towards creating a cursor to delete the records in a controlled way.
The steps I have in my head:
Would that make sense?
Upvotes: 6
Views: 22478
Reputation: 658452
Just DELETE
. Forget the cursor.
Unless your table is huge and you are deleting a large percentage of rows, there is no point in doing it in batches. And there is hardly any point even then. The only reasons I can think of would be to allow VACUUM
to remove dead tuples earlier, which might help in special cases. Or to evade locking contention (possible deadlocks?), but that begs the question why rows to be deleted should be locked by concurrent transactions in the first place.
Locks are per row for this. There is no conflict with concurrent INSERT
or UPDATE
to different rows. (If you have UPDATEs targeting some of the same rows, you have a bigger problem.) And writers don't block readers anyway on Postgres.
You can create an SQL cursor with the WITH HOLD
option and then use it with DELETE ... WHERE CURRENT OF
in separate transactions. But you would have to throw in FOR UPDATE
, locking all affected rows anyway. Rarely makes sense, except when you want to lock all affected rows quickly but still do something with them before deleting and there may be smarter ways ...
It can make sense to partition a big UPDATE
- in separate transactions - so that dead tuples can be reused in H.O.T. updates (after a manual VACUUM
or autovacuum
has kicked in). But that hardly applies to DELETE
operations which do not reuse space. Also, DELETE
is a lot faster than UPDATE
.
In the unlikely event that you still need to do it in batches, still don't use a cursor. Use something like:
WITH cte AS (
SELECT id -- your PK
FROM tbl
WHERE date < $something -- your condition
-- ORDER BY ??? -- optional, see below
LIMIT 50000
FOR UPDATE -- SKIP LOCKED ?
)
DELETE FROM tbl
USING cte
WHERE tbl.id = cte.id;
Repeat until no rows are found.
If your data is (mostly) physically sorted in a certain way, it can pay to order rows accordingly (the quoted ORDER BY
). The ORDER BY
imposes a cost of its own, but each DELETE
may be able to access far fewer data pages with clustered rows and be faster that way. Depends on the use case; if ORDER BY
can use an index, the outlook is better.
See:
Upvotes: 14