Reputation: 22500
I am having trouble understanding how recursive CTE works, in terms of the intermediate steps and the working/temporary tables involved along the way.
Slightly adapting the example from PostgreSQL Documention:
WITH RECURSIVE t(n) AS (
VALUES (1)
UNION ALL
SELECT n+1 FROM t WHERE n < 5
)
SELECT n FROM t;
Running this in Postgres (9.5), I get:
n
---
1
2
3
4
5
(5 rows)
But why didn't we get more rows? For example
SELECT n+1 FROM t WHERE n < 5
when n = 2, why doesn't the table t
have two rows
---
1
2
and based on that generate
---
2
3
?
If that were the case, the end result should have many duplicate values, such as 2
with the UNION ALL
.
The relevant part of the documentation says the following about a "working table" and an "intermediate table", which is although descriptive not clear enough to me:
1.Evaluate the non-recursive term. ... Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.
2.So long as the working table is not empty, repeat these steps:
a. Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. ... Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.
b. Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.
My questions are:
Can anyone please explain step-wise what is going on with the simple example above?
Also, I am trying to understand this recursive CTE from a programming perspective. Can anyone outline a skeleton for the algorithm in the above CTE to generate the sequence?
Upvotes: 9
Views: 1614
Reputation: 24551
Each time the second half of the CTE runs, it sees only the results of the previous run. So the initial run executes the top half and yields 1. The second run executes the bottom half. It sees t
as containing 1, so it returns 2. The third run sees t
as containing 2 (not 1 and 2, because it only sees the results of the previous run), so it returns 3.
The fourth run sees 3 and returns 4.
The fifth run sees 4 and returns 5.
The sixth run sees 5, but that is excluded by the WHERE
clause, so it returns no rows. Returning no rows is the signal to stop.
So now the complete result of the CTE is 1, 2, 3, 4, 5
, which is what everything outside the CTE sees.
Upvotes: 6
Reputation: 2473
A very important statement in the document you referenced clearly gives the reason to your observation.
Note: Strictly speaking, this process is iteration not recursion, but RECURSIVE is the terminology chosen by the SQL standards committee.
In short, WITH RECURSIVE
evaluation, is ironically iterative!
In your example, both the sets (the static set with 1
value & the SELECT
set with n
between 2
and 5
) are evaluated only once, and most importantly, in a top-down approach.
More so since, the oddity (in naming) came from the SQL standards committee, I doubt you have much leg room, but to understand the 'iterative' nature of the evaluation, despite the wording as 'Recursive'.
Upvotes: 5