Reputation: 95
PostgreSQL 9.6
I do
ALTER TABLE "Users"
ADD COLUMN "someField" BOOLEAN NULL DEFAULT NULL;
DB freezes for 10+ minutes.
The table "Users"
has 5,000 rows, but the database is very big (150+ Gb).
Immediately, more than 20+ SELECT
queries on Users table show up, (checked with):
SELECT query FROM pg_stat_activity
WHERE state = 'active' and query LIKE 'SELECT%'
(before, there were no queries)
These SELECT
queries take all the CPU.
I tried to restart the database and tried to do VACUUM ANALYZE
on the "Users"
table:
INFO: vacuuming "public.Users"
INFO: index "Users_pkey" now contains 5556 row versions in 86 pages
DETAIL: 0 index row versions were removed.
1 index pages have been deleted, 1 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.12 sec.
INFO: index "users_subscription_canceled" now contains 5028 row versions in 508 pages
DETAIL: 0 index row versions were removed.
8 index pages have been deleted, 8 are currently reusable.
CPU 0.00s/0.00u sec elapsed 1.54 sec.
INFO: index "users_shopper_id" now contains 5556 row versions in 205 pages
DETAIL: 0 index row versions were removed.
1 index pages have been deleted, 1 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.64 sec.
INFO: index "users_referrer" now contains 5556 row versions in 586 pages
DETAIL: 0 index row versions were removed.
4 index pages have been deleted, 4 are currently reusable.
CPU 0.00s/0.00u sec elapsed 1.80 sec.
INFO: index "users_referral_code" now contains 5556 row versions in 84 pages
DETAIL: 0 index row versions were removed.
2 index pages have been deleted, 2 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.27 sec.
INFO: "Users": found 0 removable, 2137 nonremovable row versions in 803 out of 2780 pages
DETAIL: 445 dead row versions cannot be removed yet.
There were 25647 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 6.91 sec.
INFO: "Users": stopping truncate due to conflicting lock request
INFO: vacuuming "pg_toast.pg_toast_26460"
INFO: index "pg_toast_26460_index" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "pg_toast_26460": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: analyzing "public.Users"
INFO: "Users": scanned 2780 of 2780 pages, containing 5539 live rows and 459 dead rows; 5539 rows in sample, 5539 estimated total rows
VACUUM
DROP COLUMN
or RENAME COLUMN
, the same points 4 & 5 happen - and database freezes.Questions:
Is this some kind of error with the database? Adding Nullable field should be really fast.
Any recommendations are really welcome, I am tired of Googling and debugging it :)
Upvotes: 5
Views: 1188
Reputation: 246238
This is normal behavior; the problem is your workload.
You are right that ALTER TABLE ... ADD COLUMN
is a very fast operation. That is not the problem. The problem is that such an ALTER TABLE
needs a short ACCESS EXCLUSIVE
lock on the table, since it modifies the table structure.
Such an ACCESS EXCLUSIVE
lock is not compatible with the ACCESS SHARE
lock that a SELECT
statement puts on the table. That is the purpose: how should a SELECT
statement behave if the table changes while it is running?
Now the problem is that either your queries take a long time, or somebody forgot to close a transaction that has a lock on the table.
You can check this with
SELECT pid, a.state, a.xact_start
FROM pg_locks AS l
JOIN pg_stat_activity AS a USING (pid)
WHERE l.relation = 'Users'::regclass;
That will show all transactions that have a lock on the table and when they started.
Now your ALTER TABLE
has to wait until all these transactions are done, and all the short SELECT
statements that are issued later have to queue behind the ALTER TABLE
.
As soon as the ALTER TABLE
gets the lock it needs, it will be done very quickly and release the lock on the table. Now all the other statements that have been queuing will be turned loose at the same time and create high load on your machine.
The solution consists of two parts:
Fix the application so that it closes transactions right away.
Reduce max_connections
as much as possible by using a connection pool. Then the number of statements that can be blocked is limited, and there is less danger of overloading the machine.
Upvotes: 5