Reputation: 1
I am trying to improve the performance of the base using the cluster
command. For tests, I created copies of the original tables and sorted the data in them by foreign keys, by which the most frequent selections are made.
There are big improvements for low selectivity queries. For example, for queries like select * from table1 t1 join table2 t2 on t1.a = t2.a where t1.a = 1;
performance improved by 10-1000 times.
However, for queries with high selectivity the performance gain is either zero or negative. For example, a query like select a, sum(b) from table1 group by a;
takes the same amount of time as before. And in some cases, when using several related tables with full reading, without filtering, the execution time increases by more than 2 times.
The query cost in queries against tables with sorted data has decreased by several times.
I tried to collect more detailed statistics: for foreign keys I executed the alter table ... command. alter column ... set statistics = 10000;
After that, the query cost usually decreases by another 10-100 times. But it does not lead to actual acceleration. The structure of the query plan (for example, methods of accessing tables or using indexes) does not change. Only the cost changes.
P.S.: For some reasons I can't attach the query plans temporarily.
Upvotes: 0
Views: 40
Reputation: 247615
Clustering a table (rewriting it in index order) will only help with index range scans, that is, when you select many adjacent rows through an index (a condition like WHERE key < 1000
). If you just retrieve a single row using an index, the physical ordering doesn't matter. Also, if you scan the whole table, clustering might help if you need the data sorted in index order, but other than that you will see no benefit.
Increasing the statistics target will only help improve the estimates, so it is only beneficial if it changes the estimates significantly so that PostgreSQL can pick a better execution plan. Other than that, more detailed statistics won't help. On the contrary, they will slow down processing, because the optimizer has to dig through the large histogram or list of most common values.
That said, I don't see how clustering can slow down query processing. Perhaps the physical ordering happened to be more beneficial before, but most likely it is an unrelated effect, like less of your data happened to be cached.
Upvotes: 2