barteloma
barteloma

Reputation: 6845

How to get Postgresql total cost time from explain

I have an sql query on postgresql 9.5, but it takes too long time. And I run the explain query:

DELETE FROM source v1
    WHERE id < (SELECT MAX(id)
            FROM source v2
            WHERE v2.ent_id = v1.ent_id
            AND v2.name = v1.name
        );      

And ex plain is

Delete on source v1  (cost=0.00..1764410287608.21 rows=2891175 width=6)');
  ->  Seq Scan on source v1  (cost=0.00..1764410287608.21 rows=2891175 width=6)');
        Filter: (id < (SubPlan 2))');
        SubPlan 2');
          ->  Result  (cost=203424.76..203424.77 rows=1 width=0)');
                InitPlan 1 (returns $2)');
                  ->  Limit  (cost=0.43..203424.76 rows=1 width=8)');
                        ->  Index Scan Backward using source_id_ix on source v2  (cost=0.43..813697.74 rows=4 width=8)');
                              Index Cond: (id IS NOT NULL)');
                              Filter: (((ent_id)::text = (v1.ent_id)::text) AND ((name)::text = (v1.name)::text))');

My table has about 8.000.000 records. And I could not get the result for days. And I could not calculate how many times will take? is there any way for a new solution?

Upvotes: 0

Views: 1091

Answers (1)

Laurenz Albe
Laurenz Albe

Reputation: 246228

There is no really good way to predict execution time.

As a very rough rule of thumb, you can compare a cost of 1 to the time of reading one 8 KB page from disk during a sequential scan, but that will often be off by more than an order of magnitude.

To solve the underlying problem, try

DELETE FROM source AS v1
WHERE EXISTS (SELECT 1
              FROM source AS v2
              WHERE (v1.ent_id, v1.name) = (v2.ent_id, v2.name)
              AND v2.id > v1.id);

The problem with your query is that it has to execute an expensive subselect for every row found, while mine can perform a semijoin. Look at my query's execution plan.

Upvotes: 2

Related Questions