Reputation: 6845
I have an sql query on postgresql 9.5, but it takes too long time. And I run the explain query:
DELETE FROM source v1
WHERE id < (SELECT MAX(id)
FROM source v2
WHERE v2.ent_id = v1.ent_id
AND v2.name = v1.name
);
And ex plain is
Delete on source v1 (cost=0.00..1764410287608.21 rows=2891175 width=6)');
-> Seq Scan on source v1 (cost=0.00..1764410287608.21 rows=2891175 width=6)');
Filter: (id < (SubPlan 2))');
SubPlan 2');
-> Result (cost=203424.76..203424.77 rows=1 width=0)');
InitPlan 1 (returns $2)');
-> Limit (cost=0.43..203424.76 rows=1 width=8)');
-> Index Scan Backward using source_id_ix on source v2 (cost=0.43..813697.74 rows=4 width=8)');
Index Cond: (id IS NOT NULL)');
Filter: (((ent_id)::text = (v1.ent_id)::text) AND ((name)::text = (v1.name)::text))');
My table has about 8.000.000 records. And I could not get the result for days. And I could not calculate how many times will take? is there any way for a new solution?
Upvotes: 0
Views: 1091
Reputation: 246228
There is no really good way to predict execution time.
As a very rough rule of thumb, you can compare a cost of 1 to the time of reading one 8 KB page from disk during a sequential scan, but that will often be off by more than an order of magnitude.
To solve the underlying problem, try
DELETE FROM source AS v1
WHERE EXISTS (SELECT 1
FROM source AS v2
WHERE (v1.ent_id, v1.name) = (v2.ent_id, v2.name)
AND v2.id > v1.id);
The problem with your query is that it has to execute an expensive subselect for every row found, while mine can perform a semijoin. Look at my query's execution plan.
Upvotes: 2