user1496783
user1496783

Reputation: 27

Collecting statistics in Teradata

In case of a nested query in teradata, if I have to collect statistics. Whats the ideal way to do it :

(1) Will I collect statistics on the entire nested query.

(2) Or will I collect statistics on only that part of a query on which I need statistics on?

I know its a kind off ambiguous question but I wanted to know the ideal way to approach this scenario.

Upvotes: 1

Views: 3280

Answers (1)

Rob Paller
Rob Paller

Reputation: 7786

TL;DR - Option 2

You can begin determining what the optimizer believes is necessary from a statistics perspective by using the following diagnostic command and then running an EXPLAIN for the query in question:

SET DIAGNOSTIC HELPSTATS ON FOR SESSION;

At the end of the EXPLAIN the optimizer will include, albeit aggressive at times, the missing statistics that the optimizer believes will improve the query plan confidence. For starters, I would focus on the statistics it indicates with High Confidence and re-run the EXPLAIN afterward to compare the changes in the query plan. (You saved the original EXPLAIN output, right?) It is an iterative process and with each additional set of statistics you collect you should compare the EXPLAIN output to determine if it improved the confidence of particular steps or provided the optimizer with a better alternative. The general rule of thumb is that statistics should never result in a less optimal plan.

Often the HELPSTATS diagnostic will include many additional statistics with lower confidence. Keep in mind that if your environment has a scheduled process to maintain statistics every additional set of stats you collect may increase the overhead of that process. You will wish to coordinate with the DBA of the environment any additional statistics you collect so they are aware. In some environments, statistics may be maintained as part of the physical data model so anything you collect may need to be documented.

Lastly, Teradata 14.10 takes everything you thought you knew about statistics maintenance and turns it upside down. Database Query Logging is being expanded to tie into this new statistics "process" (for lack of a better term) and when enabled the process can suggest statistics to collect based on query patterns on the system. It can also identify statistics that have been collected that are not being used and stop collecting them.

Upvotes: 2

Related Questions