Reputation: 71
I'm using PostgreSQL 11, and I have the following two tables:
As you can see, the
registration
table has a primary key composed of 3 varchar columns (I know), and the task
table has a foreign key to the registration
table. Each of these tables has around 4-6 million records. I also have the following query (generated by JPA) that is executed semi-regularly:
select registrati0_.product_service_instance_id as product_6_4_1_, registrati0_.service_id as service_1_4_1_, registrati0_.version_id as version_2_4_1_, registrati0_.created as created3_4_1_, registrati0_.dependencies as dependen4_4_1_, registrati0_.is_manually_created as is_manua5_4_1_, tasks1_.registration_product_service_instance_id as registra8_6_3_, tasks1_.registration_service_id as registra9_6_3_, tasks1_.registration_version_id as registr10_6_3_, tasks1_.name as name1_6_3_, tasks1_.name as name1_6_0_, tasks1_.registration_product_service_instance_id as registra8_6_0_, tasks1_.registration_service_id as registra9_6_0_, tasks1_.registration_version_id as registr10_6_0_, tasks1_.content_etag as content_2_6_0_, tasks1_.created as created3_6_0_, tasks1_.depends_on as depends_4_6_0_, tasks1_.phases as phases5_6_0_, tasks1_.retry_count as retry_co6_6_0_ from registration registrati0_ left outer join task tasks1_ on registrati0_.product_service_instance_id=tasks1_.registration_product_service_instance_id and registrati0_.service_id=tasks1_.registration_service_id and registrati0_.version_id=tasks1_.registration_version_id where registrati0_.product_service_instance_id=$1 and registrati0_.service_id=$2 and registrati0_.version_id=$3;
It's kind of hard to read, so I'll try to simplify it a bit:
SELECT registration.*, task.*
FROM registration
LEFT OUTER JOIN task ON registration.product_service_instance_id=task.registration_product_service_instance_id
AND registration.service_id=task.registration_service_id
AND registration.version_id=task.registration_version_id
WHERE registration.product_service_instance_id=$1
AND registration.service_id=$2
AND registration.version_id=$3
(Sorry about the crap formatting!)
This query used to have a decent performance (<4 seconds). However, after adding the created
column to the task
table, its execution time jumped up to >4 minutes even though I am not using the new column in the WHERE
clause! Before moving on further, here's how I added the column:
ALTER TABLE task
ADD COLUMN created TIMESTAMP; -- Nullable
UPDATE task SET created=CURRENT_TIMESTAMP WHERE created IS NULL;
From what I read online, updating a row in PostgreSQL is equal to INSERT
+ DELETE
, so my update probably resulted in a complete table re-write. Can anyone confirm this, by the way? However, even if that's true, why did the query suddenly become so slow?
I know about the PostgreSQL issue with adding a new column with a default value, but that only affects the performance of the update query itself - not of consecutive selects. Maybe there was some cache on PostgreSQL-side (a temporary index or something) that got invalidated when I added the new column?
The only thing that brought the query's execution time back to reasonable values was adding a new index to the task table:
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_task_registration_ids ON task (registration_product_service_instance_id, registration_service_id, registration_version_id)
It made the LEFT OUTER JOIN
extremely fast.
Can anyone explain what happened so that I can avoid such issues in the future?
Upvotes: 0
Views: 584
Reputation: 67
Please check the bloat on the table. This update would have created a lot of dirty records which can be cleared using the VACUUM operation. Check query for bloat here: https://wiki.postgresql.org/wiki/Show_database_bloat
Upvotes: 2