Foursuits
Foursuits

Reputation: 33

how do amps and spool interact with each other? Doing a close reading of EXPLAIN plan

Can someone help me do a close reading of EXPLAIN plan? I want to become better at reading these but they are still confusing to me. I have written everything I understand in bold, with questions.

EXPLAIN SELECT DISTINCT ProcID FROM dbadmin.v_DBATraceLog;

1) First, we lock dbadmin.DBATraceLog for access. (lock table)

2) Next, we do an all-AMPs RETRIEVE step from dbadmin.DBATraceLog by way of an all-rows scan ( so full table scan )

with no residual conditions into Spool 1 (group_amps), (Why is it called Spool 1? How many Spools are there per amp, because I thought there was only one spool space per user? Why does it say group_amps? Can all amps access a common Spool 1 or is Spool 1 a local spool on each amp?)

which is redistributed by hash code to all AMPs. (the rows leave Spool 1 and enter disk space on each AMP now, I guess)

Then we do a SORT to order Spool 1 (nvm, it is still on Spool 1?)

by the sort key in spool field1 (Sorting puts like records adjacent, I guess that is how duplicates can be eliminated) eliminating duplicate rows. The size of Spool 1 is estimated with low confidence to be 4,592 rows. The estimated time for this step is 0.03 seconds.

3) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.03 seconds.


Mostly I don't understand how AMPS and Spool interact with each other. My understanding is that it's an intermediate space to keep results or to stage records for redistribution because joins have to happen amp-local. I thought this is a common space that all amps have equal access to (though if one amp requires more than 1/num_of_amps amount of spool space the process is aborted).

Why then is there numberings for Spool in the Explain plan? Where is spool 1 or 3? Is there a spool 1 on every amp? What does group_amps mean in this context?

Upvotes: 1

Views: 506

Answers (1)

dnoeth
dnoeth

Reputation: 60482

There might be multiple spools per query, one for each intermediate result and one for the final result set. Intermediate spools are created in one step, used in a following step (or multiple steps) and finally dropped, indicated by (LastUse). The answer set spool (The contents of Spool n are sent back) is released after the last row of the result set has been delivered to the client.

A spool with the number n might exist on a single AMP, or multiple AMPs or all AMPs. (group_amps) indicates the optimizer assumes that it's a small result and the spool might only be created on some of the AMPs (which will not participate in the next step).

The spool number doesn't have a meaning, it's just a number assigned by the optimizer (although those number are usually sequential), when you check the same query in Viewpoint's QueryMonitor it will be a different, usually much higher, number.

A spool can be

  • locally: each AMP spools rows directly without interaction with other AMPs
  • redistributed by hash code of (column list): the optimizer assigns a Primary Index to the spool and the data is distributed based on the hash of those columns.
  • duplicated on all AMPs: each row is copied to each AMP

There's more detail on Explain output found in the SQL Request and Transaction Processing manual

Upvotes: 0

Related Questions