Reputation: 72
I am trying to understand how PostgreSQL physical index layout is. What I came to know is indexes are stores as part of set of pages with a B tree data structure. I am trying to understand how vacuumming impacts indexes. Does it help to contain its size?
Upvotes: 1
Views: 695
Reputation: 246523
B-tree indexes are decade-old technology, so a web search will turn up plenty of good detailed descriptions. In a nutshell:
A B-tree is a balanced tree of index pages (8KB in PostgreSQL), that is, every branch of the tree has the same depth. The tree is usually drawn upside down, the starting (top) node is the root node, and the pages at the bottom are called leaf nodes. Each level of the tree partitions the search space; the deeper the level, the finer the partitioning, until the individual index entries are reached in the leaf nodes. Each entry in an index page points to a table entry (in the leaf nodes) or to another index page at the next level.
This is a sketch of an index with depth three, but mind the following:
┌───────────┐
level 1 (root node) │ 20 75 100 │
└───────────┘
╱ ╱ │ ╲
╱ ╱ │ ╲
╱ ╱ │ ╲
┌───────────┐┌─────┐┌──────────┐┌─────┐
level 2 │ 5 10 15 ││ ... ││ 80 87 95 ││ ... │
└───────────┘└─────┘└──────────┘└─────┘
╱ ╱ │ ╲
╱ ╱ │ ╲
╱ ╱ │ ╲
┌─────┐┌─────┐┌──────────┐┌─────┐
level 3 (leaf nodes) │ ... ││ ... ││ 89 91 92 ││ ... │
└─────┘└─────┘└──────────┘└─────┘
Some notes:
Now to your question:
When a table (heap) entry is removed by VACUUM
because it is not visible for any active snapshot, the corresponding entry in the index is removed as well. This results in empty space in the index, which can be reused by future index entries.
Empty index pages can be deleted, but the depth of the index will never be reduced. So mass deletion can (after VACUUM
has done its job) reduce the index size, but will more likely result in a bloated index with pages that contain only few keys and lots of empty space.
A certain amount of index bloat (up to more than 50%) is normal, but if unusual usage patterns like mass updates and deletes cause bad index bloat, you'll have to rewrite the index with REINDEX
, thereby getting rid of bloat. This operation locks the index, so that all concurrent access is blocked until it is done. If that is a problem, you can use REINDEX ... CONCURRENTLY
, which does not take long-lasting heavy locks, but takes longer to complete.
Upvotes: 5