AhmadF
AhmadF

Reputation: 353

Auto tuning column statistics target in postgresql

As postgresql documents points out one way to increase query performance is to increase statistics target for some columns. it is known that default_statistics_target value is not enough for large tables (a few million row) with irregular value distribution and must be increased. it seams practical to create a script for auto-tuning statistics target for each column, i would like to know what are possible obstacles in writing such script and why i can't find such script online.

Upvotes: 0

Views: 898

Answers (2)

SQLpro
SQLpro

Reputation: 5121

One problem with PG planner statistics is that there is no way to compute statistics over all the rows inside the table. PG use always a small part of the table to compute statistics (sample percent). This way have a huge disavantage in big table: it will ignore some important values that can make the difference when estimating the cardinality for some operations of the execution plan. This may cause the use of an innappropiate algorithm.

Explanation : http://mssqlserver.fr/postgresql-vs-sql-server-mssql-part-3-very-extremely-detailed-comparison/

Especially § 12 – Planer statistics

The reason that PG do not accept a "full scan" stat, is because it will take too much time to compute ! In fact, PostgreSQL is very slow in many maintenance task such as statistics recompute, as I reveal it here : http://mssqlserver.fr/postgresql-vs-microsoft-part-1-dba-queries-performances/

In some other RDBMS it is possible to do UPDATE STATISTICS ... WITH FULLSCAN (Microsoft SQL Server as an example) is this does not take so much time, because MS SQL Server does it in parallel with multiple threads that PostGreSQL is unnable to do...

Conclusion: PostGreSQL has never been design for huge table. Think to use another RDBMS if you want to deal with big table and have performances...

Just take a look over COUNT performances of PostGreSQL compare to MS SQL Server: http://mssqlserver.fr/postgresql-vs-microsoft-sql-server-comparison-part-2-count-performances/

Upvotes: -1

Laurenz Albe
Laurenz Albe

Reputation: 246413

That's because it is not that simple. It does not primarily depend on the size of the table, but on the data in the table and their distribution, the way in which data are modified, and most of all on the queries.

So it is pretty much impossible to make that decision from a look on the persisted state, and even with more information it would require quite a bit of artificial intelligence.

Upvotes: 0

Related Questions