shekar Roy
shekar Roy

Reputation: 61

Batch delete in Postgres using a cursor

I am new to Postgres and have a task to delete records from a live table that was created unpartitioned. Now my head is towards creating a cursor to delete the records in a controlled way.

The steps I have in my head:

Would that make sense?

Upvotes: 6

Views: 22478

Answers (1)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 658452

Just DELETE. Forget the cursor.

Unless your table is huge and you are deleting a large percentage of rows, there is no point in doing it in batches. And there is hardly any point even then. The only reasons I can think of would be to allow VACUUM to remove dead tuples earlier, which might help in special cases. Or to evade locking contention (possible deadlocks?), but that begs the question why rows to be deleted should be locked by concurrent transactions in the first place.

Locks are per row for this. There is no conflict with concurrent INSERT or UPDATE to different rows. (If you have UPDATEs targeting some of the same rows, you have a bigger problem.) And writers don't block readers anyway on Postgres.

You can create an SQL cursor with the WITH HOLD option and then use it with DELETE ... WHERE CURRENT OF in separate transactions. But you would have to throw in FOR UPDATE, locking all affected rows anyway. Rarely makes sense, except when you want to lock all affected rows quickly but still do something with them before deleting and there may be smarter ways ...

It can make sense to partition a big UPDATE - in separate transactions - so that dead tuples can be reused in H.O.T. updates (after a manual VACUUM or autovacuum has kicked in). But that hardly applies to DELETE operations which do not reuse space. Also, DELETE is a lot faster than UPDATE.

In the unlikely event that you still need to do it in batches, still don't use a cursor. Use something like:

WITH cte AS (
   SELECT id                 -- your PK
   FROM   tbl
   WHERE  date < $something  -- your condition
   -- ORDER BY ???           -- optional, see below
   LIMIT  50000
   FOR    UPDATE             -- SKIP LOCKED ?
   )
DELETE FROM tbl
USING  cte
WHERE  tbl.id = cte.id;

Repeat until no rows are found.

If your data is (mostly) physically sorted in a certain way, it can pay to order rows accordingly (the quoted ORDER BY). The ORDER BY imposes a cost of its own, but each DELETE may be able to access far fewer data pages with clustered rows and be faster that way. Depends on the use case; if ORDER BY can use an index, the outlook is better.

See:

Upvotes: 14

Related Questions