Reputation: 1227
I have a variety of tables that I am joining together. Each table has a primary index and most but not all are partitioned on a date field. Each table has an associated view.
If I write a query in the form
select
*
from view1
join view2
on pi1 = pi2
join view3
on pi1 = pi3
join view4
on pi1 = pi4
...
I run into a out of spool space problem. Would it be better to query the tables directly? Would it be better to create some intermediate tables and do a few joins at a time, then create new indices and partitions on the intermediate tables?
Upvotes: 0
Views: 4770
Reputation: 60482
Creating intermediate tables should not be neccessary.
Without knowing further details there might be a simple cause:
There a two tables like invoice and invoice_line, the logical PK is (invoice_number) and (invoice_number, line_number).
The Primary INdex of both tables is (invoice_number) to get all rows for an invoice on a single AMP for faster processing.
Both tables are partitioned by invoice_date (in fact keeping the invoice_date in the invoice_line is not needed, because it's the same date for each line. It's done to get matching partitioning on both tables)
The join doesn't inlcude the invoice_date, it's just based on invoice_number. This is correct based on the PK-FK but will result in a very slow join because the optimizer doesn't know which invoice_number is stored in which partition -> all partitions need to be accessed.
In a case like that you must use invoice_date as an additional join condition.
Otherwise you must supply more info:
As already mentioned: you should post the Explain.
Additionally it might help to get the PI definition (plus partitioning) and some statistics information. The easiest way to get the DDL of all objects is a SHOW in front of the select (unless you DBA restricted that), stats are returned by HELP STATS tablename;
Upvotes: 3
Reputation:
You should first check the "Explain" output of your query. [if you are using Teradata SQL Assistant, then just select your query and press F6 - this will output the Parsing Engine's(PE) plan about how to execute your query].
I suspect that you will see a lot of "re-distribution" [I think, Teradata is a closet Socialist] in the Explain output - remember for two rows to be joined, they must be located on the same AMP. If they are not, by virtue of different PI on each of the tables you are joining via views, redistribution will be necessary.
You would also want to check if you need to collect any statistics on some of columns. Improper statistics could lead to the PE coming up with rediculous query plans. For ex: If one of the table you are joining is huge but a skewed table - the PE might incorrectly detect that it is actually a small table and try to duplicate it to all the AMPS (instead of re-distribution) and this usually results in you running out of space.
Why don't you go ahead and post the "Explain" of your query? Set this option first: DIAGNOSTIC HELPSTATS ON FOR SESSION;
Without looking at what the views are doing it is difficult to tell.
Upvotes: 2