Reputation: 2976
New to Postgres, I can't understand why changing a type from varchar(255) to varchar(36) on a field with an index drastically decreases performances from 40ms to 5s on a huge request with joins on this field.
I tried to reindex and got same result. If the field comes back to varchar(255) the good performances are back too!
Is-it a problem because the index on the field is joined to a field on another table that have kept varchar(255) ?
EDIT: added query plan (good luck...) - after changing to varchar (36)
Limit (cost=548253.86..548253.88 rows=8 width=1648) (actual time=11558.810..11558.812 rows=8 loops=1)
-> Sort (cost=548253.86..548302.20 rows=19337 width=1648) (actual time=11558.807..11558.808 rows=8 loops=1)
Sort Key: usr_wines.winery
Sort Method: top-N heapsort Memory: 28kB
-> GroupAggregate (cost=547238.66..547867.12 rows=19337 width=1648) (actual time=11557.853..11558.635 rows=28 loops=1)
Group Key: usr_wines.id, shd_countries.image, shd_regions.name, shd_appellations.name, shd_categories.image, shd_categories.name_en, shd_volumes.name
-> Sort (cost=547238.66..547287.01 rows=19337 width=1677) (actual time=11557.795..11557.865 rows=380 loops=1)
Sort Key: usr_wines.id, shd_countries.image, shd_regions.name, shd_appellations.name, shd_categories.image, shd_categories.name_en, shd_volumes.name
Sort Method: quicksort Memory: 220kB
-> Nested Loop Left Join (cost=97215.88..531781.46 rows=19337 width=1677) (actual time=1400.559..11554.940 rows=380 loops=1)
-> Nested Loop Left Join (cost=97215.74..528521.66 rows=19337 width=1675) (actual time=1400.532..11552.199 rows=380 loops=1)
Join Filter: (usr_wines.category_id = shd_categories.id)
Rows Removed by Join Filter: 1900
-> Hash Right Join (cost=97215.74..526780.25 rows=19337 width=651) (actual time=1400.480..11550.852 rows=380 loops=1)
Hash Cond: ((usr_bottles.wine_id)::text = (usr_wines.id)::text)
-> Seq Scan on usr_bottles (cost=0.00..308700.55 rows=4111092 width=74) (actual time=0.511..7610.026 rows=4384249 loops=1)
Filter: ((output_type IS NULL) AND (status = 1))
Rows Removed by Filter: 2354675
-> Hash (cost=96206.63..96206.63 rows=11129 width=614) (actual time=1219.200..1219.200 rows=120 loops=1)
Buckets: 8192 Batches: 2 Memory Usage: 86kB
-> Hash Left Join (cost=64006.72..96206.63 rows=11129 width=614) (actual time=707.669..1218.683 rows=120 loops=1)
Hash Cond: (usr_wines.appellation_id = shd_appellations.id)
-> Hash Left Join (cost=63905.91..95978.92 rows=11129 width=599) (actual time=706.019..1216.918 rows=120 loops=1)
Hash Cond: (usr_wines.region_id = shd_regions.id)
-> Hash Left Join (cost=63891.53..95822.44 rows=11129 width=587) (actual time=705.721..1216.489 rows=120 loops=1)
Hash Cond: (usr_wines.country_id = shd_countries.id)
-> Hash Right Join (cost=63883.26..95667.68 rows=11129 width=574) (actual time=705.346..1215.972 rows=120 loops=1)
Hash Cond: ((usr_tastings.wine_id)::text = (usr_wines.id)::text)
-> Seq Scan on usr_tastings (cost=0.00..21800.36 rows=471963 width=37) (actual time=0.343..304.762 rows=472470 loops=1)
Filter: (status = 1)
Rows Removed by Filter: 71559
-> Hash (cost=62928.14..62928.14 rows=11129 width=574) (actual time=654.845..654.845 rows=74 loops=1)
Buckets: 8192 Batches: 2 Memory Usage: 79kB
-> Hash Right Join (cost=47770.34..62928.14 rows=11129 width=574) (actual time=247.452..654.581 rows=74 loops=1)
Hash Cond: ((usr_ratings.wine_id)::text = (usr_wines.id)::text)
-> Seq Scan on usr_ratings (cost=0.00..10172.90 rows=214616 width=37) (actual time=0.473..170.696 rows=214368 loops=1)
Filter: (status = 1)
Rows Removed by Filter: 8904
-> Hash (cost=46815.23..46815.23 rows=11129 width=574) (actual time=199.285..199.285 rows=57 loops=1)
Buckets: 8192 Batches: 2 Memory Usage: 77kB
-> Hash Right Join (cost=36693.98..46815.23 rows=11129 width=574) (actual time=8.331..198.889 rows=57 loops=1)
Hash Cond: ((usr_pairings.wine_id)::text = (usr_wines.id)::text)
-> Seq Scan on usr_pairings (cost=0.00..5899.86 rows=175239 width=37) (actual time=0.036..92.293 rows=174939 loops=1)
Filter: (status = 1)
Rows Removed by Filter: 7450
-> Hash (cost=35738.87..35738.87 rows=11129 width=574) (actual time=1.107..1.107 rows=28 loops=1)
Buckets: 8192 Batches: 2 Memory Usage: 74kB
-> Bitmap Heap Scan on usr_wines (cost=223.94..35738.87 rows=11129 width=574) (actual time=0.331..0.491 rows=28 loops=1)
Recheck Cond: (user_id = 1)
Filter: (status = 1)
Rows Removed by Filter: 14
Heap Blocks: exact=42
-> Bitmap Index Scan on usr_wines_on_user_id_idx (cost=0.00..221.16 rows=11830 width=0) (actual time=0.038..0.038 rows=42 loops=1)
Index Cond: (user_id = 1)
-> Hash (cost=5.90..5.90 rows=190 width=21) (actual time=0.195..0.195 rows=190 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 19kB
-> Seq Scan on shd_countries (cost=0.00..5.90 rows=190 width=21) (actual time=0.033..0.132 rows=190 loops=1)
-> Hash (cost=9.17..9.17 rows=417 width=20) (actual time=0.270..0.270 rows=417 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 31kB
-> Seq Scan on shd_regions (cost=0.00..9.17 rows=417 width=20) (actual time=0.018..0.132 rows=417 loops=1)
-> Hash (cost=66.47..66.47 rows=2747 width=23) (actual time=1.599..1.599 rows=2747 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 190kB
-> Seq Scan on shd_appellations (cost=0.00..66.47 rows=2747 width=23) (actual time=0.016..0.854 rows=2747 loops=1)
-> Materialize (cost=0.00..1.09 rows=6 width=1040) (actual time=0.000..0.001 rows=6 loops=380)
-> Seq Scan on shd_categories (cost=0.00..1.06 rows=6 width=1040) (actual time=0.018..0.023 rows=6 loops=1)
-> Index Scan using shd_volumes_pkey on shd_volumes (cost=0.14..0.16 rows=1 width=18) (actual time=0.005..0.005 rows=1 loops=380)
Index Cond: (usr_wines.volume_id = id)
Planning time: 45.061 ms
Execution time: 11560.189 ms
EDIT 2: this morning, after reopening my MacBookPro, everything is at the same speed as yesterday before the migration !!
Look at the same EXPLAIN and the delays:
Limit (cost=25254.28..25254.30 rows=8 width=1587) (actual time=16.349..16.351 rows=8 loops=1)
-> Sort (cost=25254.28..25254.90 rows=247 width=1587) (actual time=16.347..16.347 rows=8 loops=1)
Sort Key: usr_wines.winery
Sort Method: top-N heapsort Memory: 28kB
-> GroupAggregate (cost=25241.31..25249.34 rows=247 width=1587) (actual time=15.156..16.153 rows=28 loops=1)
Group Key: usr_wines.id, shd_countries.image, shd_regions.name, shd_appellations.name, shd_categories.image, shd_categories.name_en, shd_volumes.name
-> Sort (cost=25241.31..25241.93 rows=247 width=1616) (actual time=15.068..15.133 rows=380 loops=1)
Sort Key: usr_wines.id, shd_countries.image, shd_regions.name, shd_appellations.name, shd_categories.image, shd_categories.name_en, shd_volumes.name
Sort Method: quicksort Memory: 220kB
-> Nested Loop Left Join (cost=2.95..25231.50 rows=247 width=1616) (actual time=0.278..13.327 rows=380 loops=1)
Join Filter: (usr_wines.volume_id = shd_volumes.id)
Rows Removed by Join Filter: 13680
-> Nested Loop Left Join (cost=2.95..25092.95 rows=247 width=1614) (actual time=0.258..10.480 rows=380 loops=1)
Join Filter: (usr_wines.category_id = shd_categories.id)
Rows Removed by Join Filter: 1900
-> Nested Loop Left Join (cost=2.95..25069.64 rows=247 width=590) (actual time=0.228..9.727 rows=380 loops=1)
-> Nested Loop Left Join (cost=2.39..6867.48 rows=142 width=553) (actual time=0.179..4.943 rows=120 loops=1)
-> Nested Loop Left Join (cost=2.11..6627.81 rows=142 width=538) (actual time=0.173..4.636 rows=120 loops=1)
-> Nested Loop Left Join (cost=1.84..6549.21 rows=142 width=526) (actual time=0.166..4.333 rows=120 loops=1)
-> Nested Loop Left Join (cost=1.69..6500.72 rows=142 width=513) (actual time=0.161..4.040 rows=120 loops=1)
-> Nested Loop Left Join (cost=1.27..3776.28 rows=142 width=513) (actual time=0.120..2.399 rows=53 loops=1)
-> Nested Loop Left Join (cost=0.85..2044.12 rows=142 width=513) (actual time=0.085..1.112 rows=42 loops=1)
-> Index Scan using usr_wines_on_user_id_idx on usr_wines (cost=0.43..287.26 rows=142 width=513) (actual time=0.024..0.127 rows=28 loops=1)
Index Cond: (user_id = 1)
Filter: (status = 1)
Rows Removed by Filter: 14
-> Index Scan using usr_tastings_on_wine_id_idx on usr_tastings (cost=0.42..12.35 rows=2 width=37) (actual time=0.033..0.034 rows=1 loops=28)
Index Cond: ((usr_wines.id)::text = (wine_id)::text)
Filter: (status = 1)
-> Index Scan using usr_ratings_on_wine_id_idx on usr_ratings (cost=0.42..12.18 rows=2 width=37) (actual time=0.029..0.030 rows=1 loops=42)
Index Cond: ((usr_wines.id)::text = (wine_id)::text)
Filter: (status = 1)
-> Index Scan using usr_pairings_on_wine_id_idx on usr_pairings (cost=0.42..19.15 rows=4 width=37) (actual time=0.028..0.030 rows=2 loops=53)
Index Cond: ((usr_wines.id)::text = (wine_id)::text)
Filter: (status = 1)
-> Index Scan using shd_countries_pkey on shd_countries (cost=0.14..0.33 rows=1 width=21) (actual time=0.001..0.002 rows=1 loops=120)
Index Cond: (usr_wines.country_id = id)
-> Index Scan using shd_regions_pkey on shd_regions (cost=0.27..0.54 rows=1 width=20) (actual time=0.002..0.002 rows=1 loops=120)
Index Cond: (usr_wines.region_id = id)
-> Index Scan using shd_appellations_pkey on shd_appellations (cost=0.28..1.68 rows=1 width=23) (actual time=0.002..0.002 rows=1 loops=120)
Index Cond: (usr_wines.appellation_id = id)
-> Index Scan using usr_bottles_on_wine_id_idx on usr_bottles (cost=0.56..127.99 rows=19 width=74) (actual time=0.034..0.039 rows=3 loops=120)
Index Cond: ((usr_wines.id)::text = (wine_id)::text)
Filter: ((output_type IS NULL) AND (status = 1))
Rows Removed by Filter: 2
-> Materialize (cost=0.00..1.09 rows=6 width=1040) (actual time=0.000..0.001 rows=6 loops=380)
-> Seq Scan on shd_categories (cost=0.00..1.06 rows=6 width=1040) (actual time=0.016..0.018 rows=6 loops=1)
-> Materialize (cost=0.00..1.56 rows=37 width=18) (actual time=0.000..0.003 rows=37 loops=380)
-> Seq Scan on shd_volumes (cost=0.00..1.37 rows=37 width=18) (actual time=0.007..0.012 rows=37 loops=1)
Planning time: 29.919 ms
Execution time: 16.728 ms
Is-it possible that there is a kind of cache on indexes or something like that ?
The query:
SELECT usr_wines.id, usr_wines.winery, usr_wines.name, usr_wines.year, usr_wines.image_file_name, usr_wines.image2_file_name, usr_wines.image3_file_name, usr_wines.price, usr_wines.currency, usr_wines.maturity_from, usr_wines.maturity_to, usr_wines.apogee_from, usr_wines.apogee_to, usr_wines.favorite, usr_wines.wishlist, usr_wines.comment, usr_wines.alcohol, usr_wines.country_id, usr_wines.region_id, usr_wines.appellation_id, usr_wines.classification_id, usr_wines.hue_id, usr_wines.container_id, usr_wines.capsule_id, shd_countries.image AS country_image, shd_regions.name AS region_name, shd_appellations.name AS appellation_name, shd_categories.image AS category_image_name, shd_categories.name_en AS category_name, shd_volumes.name AS volume_name,
COUNT(DISTINCT(usr_bottles.id)) AS bottle_count
FROM usr_wines
LEFT JOIN usr_bottles ON usr_wines.id = usr_bottles.wine_id AND usr_bottles.status=1 AND usr_bottles.output_type IS NULL
LEFT JOIN usr_tastings ON usr_wines.id = usr_tastings.wine_id AND usr_tastings.status=1
LEFT JOIN usr_ratings ON usr_wines.id = usr_ratings.wine_id AND usr_ratings.status=1
LEFT JOIN usr_pairings ON usr_wines.id = usr_pairings.wine_id AND usr_pairings.status=1
LEFT JOIN shd_countries ON usr_wines.country_id = shd_countries.id
LEFT JOIN shd_regions ON usr_wines.region_id = shd_regions.id
LEFT JOIN shd_appellations ON usr_wines.appellation_id = shd_appellations.id
LEFT JOIN shd_classifications ON usr_wines.classification_id = shd_classifications.id
LEFT JOIN shd_categories ON usr_wines.category_id = shd_categories.id
LEFT JOIN shd_volumes ON usr_wines.volume_id = shd_volumes.id
WHERE usr_wines.user_id = 1 AND usr_wines.status=1
GROUP BY usr_wines.id, shd_countries.image, shd_regions.name, shd_appellations.name, shd_categories.image, shd_categories.name_en, shd_volumes.name
ORDER BY usr_wines.winery asc
LIMIT 8 OFFSET 0;
Upvotes: 0
Views: 155
Reputation: 247485
It is impossible to say what changed after you switched from varchar(255)
to varchar(36)
because you provided no EXPLAIN (ANALYZE)
output for the former, but we can guess that the problem is related to table statistics.
At any rate, if you compare the scans on table usr_wines
, your slow example has:
Bitmap Heap Scan on usr_wines (cost=223.94..35738.87 rows=11129 width=574)
(actual time=0.331..0.491 rows=28 loops=1)
Recheck Cond: (user_id = 1)
Filter: (status = 1)
Rows Removed by Filter: 14
Heap Blocks: exact=42
-> Bitmap Index Scan on usr_wines_on_user_id_idx (cost=0.00..221.16 rows=11830 width=0)
(actual time=0.038..0.038 rows=42 loops=1)
Index Cond: (user_id = 1)
which is an overestimate by a factor of almost 400, while your fast example has:
Index Scan using usr_wines_on_user_id_idx on usr_wines (cost=0.43..287.26 rows=142 width=513)
(actual time=0.024..0.127 rows=28 loops=1)
Index Cond: (user_id = 1)
Filter: (status = 1)
Rows Removed by Filter: 14
which is only wrong by a factor of 5.
Changing the data type will understandably remove all table statistics collected for the column, so subsequent queries will get bad query plans.
The solution is to ANALYZE
the table manually to calculate table statistics.
In your case, an autovacuum that ran over night fixed the problem for you.
Upvotes: 1