Reputation: 189
I have a very large table, 400MM records, that normally only receives inserts. However, recently I have to make a lot of updates to the records to complete a task. This is creating a lot of dead tuples. I have updated the global configs to the following:
autovacuum_vacuum_scale_factor = 0
autovacuum_vacuum_threshold = 10000
autovacuum_vacuum_cost_limit = 2000
autovacuum_max_workers = 6
With these settings, my intent was that anytime dead tuples increases past 10,000 records, the autovacuum would clean it up.
However, I'm finding that when the table is busy with other inserts, etc, the dead tuple count does not change. It stays fixed at a certain dead tuple count. It's only when the db activity slows down at night does the autovacuum appear to work well.
I need the autovacuum to aggressively deal with dead tuples throughout the day. How would I accomplish this? Do I need to increase the max_workers count?
UPDATE: user @Laurenz Albe suggested I run some performance numbers with and without dead tuples to demonstrate the performance difference.
I will provide the sql query and the EXPLAIN(ANALYZE, BUFFERS) results. I changed the name of the table and group key for privacy.
EXPLAIN (ANALYZE, BUFFERS)
SELECT ld.upid,
MAX(ld.lid)
INTO _tt_test_with_dead_tuples
FROM big_table ld
GROUP BY ld.upid;
-- >>> With roughly 1% (3.648MM out of 383.2MM) dead tuples, results below.
HashAggregate (cost=25579746.07..25584552.97 rows=480690 width=8) (actual time=5966760.520..5975279.359 rows=16238417 loops=1)
Group Key: upid
Buffers: shared hit=3015376 read=16753169 dirtied=1759802 written=1360458
-> Seq Scan on big_table ld (cost=0.00..23642679.05 rows=387413405 width=8) (actual time=0.024..5593239.148 rows=383753513 loops=1)
Buffers: shared hit=3015376 read=16753169 dirtied=1759802 written=1360458
Planning time: 2.677 ms
Execution time: 6012319.846 ms
-- >>> With 0 dead tuples, results below.
HashAggregate (cost=25558409.48..25562861.52 rows=445204 width=8) (actual time=825662.640..835163.398 rows=16238417 loops=1)
Group Key: upid
Buffers: shared hit=15812 read=19753809
-> Seq Scan on big_table ld (cost=0.00..23628813.32 rows=385919232 width=8) (actual time=0.020..533386.128 rows=383753513 loops=1)
Buffers: shared hit=15812 read=19753809
Planning time: 10.109 ms
Execution time: 843319.731 ms
Upvotes: 3
Views: 4217
Reputation: 247380
The dead tuples are not your problem.
Your real problem is somewhere else; I have highlighted it in the following.
Sequential scan in the slow query:
Buffers: shared hit=3015376 read=16753169 dirtied=1759802 written=1360458
Sequential scan in the fast query:
Buffers: shared hit=15812 read=19753809
It looks like about two million table blocks contain tuples that have been recently written or updated.
At the time a tuple is written, PostgreSQL does not yet know if the transaction will commit or roll back, so this information is not stored in the tuple. It is, however, recorded in the commit log which is stored in pg_xact
(or pg_clog
, depending on your version).
Now the first reader that comes along and reads a newly written tuple will have to consult the commit log to figure out whether the tuple “exists” or not. To save future readers that hassle, it sets the so-called hint bits in the tuple to reflect that information.
This changes and hence “dirties” the block containing the tuple, and if you have a small shared_buffers
setting and run out of available buffers, the backend will even have to write blocks out to storage to clean them and make room.
This writing is what makes your query so slow.
Vacuuming the table makes the problem go away because VACUUM
does not only clean up dead tuples, but also sets the hint bits for you (it is a reader too!).
To verify that, run the same SELECT
a second time without vacuuming the table, and you will observe that it will be just as fast with the 3 million dead tuples, because now the hint bits are all set.
This is one reason why it can be a good idea to run VACUUM
on a table after you have loaded a lot of rows, even if there is nothing to clean up – you save the first reader a lot of work.
Idea: does increasing shared_buffers
improve the situation?
But since vacuuming the table solves the problem, you may as well use autovacuum to set the hint bits more often.
For that, you can set autovacuum_vacuum_scale_factor
to 0 and set autovacuum_vacuum_threshold
to a large constant (way bigger than 10000), so that there are never too many rows without hint bits.
In addition, set autovacuum_vacuum_cost_delay
to 0 so that autovacuum finishes fast.
Don't change these parameters globally, use ALTER TABLE ... SET (...)
to set them for this table only.
Upvotes: 4