Reputation: 3
I have a question about the number of deadtuples after an update scenario in PGSQL.
Due to MVCC protocol in PGSQL, many updates cause the tables to bloat. We may track this with the number of dead tuples in the table. In the following scenario, I can't give any meaning the number of dead tuples in the following scenario.
I am updating "pgbench_accounts" table with the following testcase:
First, I turnedd off autovacuum in the system beforehand. ...=alter system set autovacuum to off; ...=# select pg_reload_conf();
Then I load the pgbench tables. > pgbench -i -- Now, pgbench_accounts has 100.000 tuples
Then I am updating randomly generated tuples as below: >pgbench -t10000 -fupdate-only -n
update-only tx is as the following:
\set aid random(1, 100000 * :scale)
\set delta random(-5000, 5000)
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
END;
As you guess, I want to update 10.000 tuples with a serial of simple update tx as shown above.
After the process end, I am seeing: 9543 updated tuples. That seems OK, because some tuples are updated twice or more. The problem is I am seeing only 3277 dead tuples.
Although I turned off autovacuum, some dead tuples disappear. how does this happen? I am expecting to see, at least the same number of updated tuples?
Upvotes: 0
Views: 275
Reputation: 247370
It could be that not all updates were counted (statistics are not necessarily 100% accurate), but more likely the difference can be explained with HOT updates.
If there is still room in the block, and you don't update an indexed column, PostgreSQL can use a HOT update that doesn't require modifying the index. Dead tuples from HOT updates don't need VACUUM
for cleanup. Any SELECT
or other statement (in your case: UPDATE
) can grab a brief lock on the page and reorganize it, getting rid of dead tuples.
You can verify that with
SELECT n_tup_upd, n_tup_hot_upd
FROM pg_stat_all_tables
WHERE relname = 'pgbench_accounts';
Upvotes: 1