Postgresql concurrent index

Question

I'm trying to understand how PostgreSQL builds index concurrently without write lock.

Could someone describe the steps performed by PostgreSQL for that to do this while continuously written into the table data?

Craig Ringer · Accepted Answer

The relevant detail is in the source code comments. See the comments on validate_index in src/backend/catalog/index.c around line 2607:

We do a concurrent index build by first inserting the catalog entry for the index via index_create(), marking it not indisready and not indisvalid. Then we commit our transaction and start a new one, then we wait for all transactions that could have been modifying the table to terminate.

.... and lots, lots more. Basically "it's complicated". I'll attempt to explain it, but I haven't read the code in detail and I don't know this part of the codebase, so the only correct explanation is the comments and source code.

My understanding is that it does an initial build based on an MVCC snapshot of the table state, committing it when it's done. It then waits until all transactions can see the (broken) index, at which point they'll all be updating it when they change things in the table. It then compares what was visible when it built the index to what is visible now and updates the index to reflect the differences between the snapshots. It then waits to make sure there are no transactions that could see the index while it was in an invalid state, marks the index valid, and commits again.

The whole process relies heavily on MVCC snapshots and visibility. It's also considerably more expensive in terms of I/O, CPU and RAM than a regular index build is.

validate_index is called by DefineIndex in src/backend/commands/indexcmds.c, which contains details about the overall process.

Postgresql concurrent index

Answers (1)

Related Questions