Reputation: 8490
Here: (To find infinite recursive loop in CTE) is a discussion how to prevent an infinite loop in a recursive query. There the recursion is prevented on the "query level" - at least in an answer about Postgresql.
Is there a way in Postgresql (10) to implement some kind of safety net to prevent infinite recursions? Is it a feasible way to use statement_timeout
for this or is there any other widely accepted way?
Upvotes: 19
Views: 14698
Reputation: 382762
If you don't need to return the recursion depth as an output, then using UNION
instead of UNION ALL
is the way to go on both PostgreSQL and SQLite as mentioned at: https://stackoverflow.com/a/77165059/895245
However if you also want to track the depth of results, that won't do as the depth column makes the values become different from one another.
The solutions from the answer To find infinite recursive loop in CTE cover that as well however.
Test data
Create some demo data:
CREATE TABLE "graph" ("id" integer, "parent_id" integer);
INSERT INTO "graph" VALUES
(1, null), -- root element
(2, 1),
(3, 1),
(4, 3),
(5, 4),
(3, 5); -- endless loop
Graph represented:
1 --+--> 2
|
+--> 3 --> 4 --> 5 --+
^ |
| |
+---------------+
PostgreSQL 14+: CYCLE
keyword
WITH RECURSIVE "cte"("id", "parent_id", "depth") AS (
SELECT "id",
"parent_id",
0
FROM "graph"
WHERE "parent_id" IS NULL
UNION ALL
SELECT "graph"."id",
"graph"."parent_id",
"cte"."depth" + 1
FROM "graph"
INNER JOIN "cte"
ON "graph"."parent_id" = "cte"."id"
)
CYCLE "id" -- track cycles for this column
SET "is_cycle" -- adds a boolean column is_cycle
USING "path" -- adds a column that contains all parents for the id
SELECT *
FROM "cte"
WHERE NOT "is_cycle"
ORDER BY "id" ASC;
Output:
id | parent_id | depth | is_cycle | path
----+-----------+-------+----------+-------------------
1 | | 0 | f | {(1)}
2 | 1 | 1 | f | {(1),(2)}
3 | 1 | 1 | f | {(1),(3)}
4 | 3 | 2 | f | {(1),(3),(4)}
5 | 4 | 3 | f | {(1),(3),(4),(5)}
(5 rows)
So we understand that this method tracks the paths down to each element and uses that information to detect loops and stop before.
Detect loops instead of ignoring them
If we wanted instead to check if a loop exists in the DB we could just change:
WHERE NOT "is_cycle";
to:
WHERE "is_cycle";
which then gives us the loop path:
id | parent_id | depth | is_cycle | path
----+-----------+-------+----------+-----------------------
3 | 5 | 4 | t | {(1),(3),(4),(5),(3)}
See also: To find infinite recursive loop in CTE
PostgreSQL pre-14: explicit array manipulation
The CYCLE
keyword is a shortcut for the following which uses an explicit array to track the path and detect loops:
WITH RECURSIVE "cte"("id", "parent_id", "depth", "is_cycle", "path") AS (
SELECT "id",
"parent_id",
0,
false,
ARRAY["id"] as "path"
FROM "graph"
WHERE "parent_id" IS NULL
UNION ALL
SELECT "graph"."id",
"graph"."parent_id",
"cte"."depth" + 1,
"graph"."id" = any("path"),
"cte"."path" || "graph"."id"
FROM "graph"
INNER JOIN "cte"
ON "graph"."parent_id" = "cte"."id"
AND NOT "is_cycle"
)
SELECT *
FROM "cte"
WHERE NOT "is_cycle"
ORDER BY "id" ASC;
Output:
id | parent_id | depth | is_cycle | path
----+-----------+-------+----------+-----------
1 | | 0 | f | {1}
2 | 1 | 1 | f | {1,2}
3 | 1 | 1 | f | {1,3}
4 | 3 | 2 | f | {1,3,4}
5 | 4 | 3 | f | {1,3,4,5}
(5 rows)
Using UNION
instead of UNION ALL
when depth
is not needed
Just to give an example of that method:
WITH RECURSIVE "cte"("id", "parent_id") AS (
SELECT "id", "parent_id"
FROM "graph"
WHERE "parent_id" IS NULL
UNION
SELECT "graph"."id",
"graph"."parent_id"
FROM "graph"
INNER JOIN "cte"
ON "graph"."parent_id" = "cte"."id"
)
SELECT *
FROM "cte"
ORDER BY "id" ASC;
Output:
id | parent_id
----+-----------
1 |
2 | 1
3 | 1
3 | 5
4 | 3
5 | 4
(6 rows)
That exact same query also works on SQLite which doesn't have neither ARRAY
nor CYCLE
, output:
1|
2|1
3|1
3|5
4|3
5|4
Tested on PostgreSQL 16.4, SQLite 3.45.1, Ubuntu 24.04.
Upvotes: 0
Reputation: 31
For parent-child relationships, simply replacing UNION ALL
with UNION
prevents an infinite loop in recursive queries if you are interested in parent and child IDs only. I doubt that UNION ALL
is very useful in recursive queries at all. Here is algorithm described on PostgreSQL site:
Evaluate the non-recursive term. For
UNION
(but notUNION ALL
), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.So long as the working table is not empty, repeat these steps:
(a) Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For
UNION
(but notUNION ALL
), discard duplicate rows and rows that duplicate any previous result row. Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.(b) Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.
Here 2.a ensures that duplicates are discarded while evaluating the recursive term, so the infinite loop breaks at this point.
Upvotes: 2
Reputation: 121594
In my development environment, I always use two fuses for recursive queries or functions. My client automatically sets on startup
set statement_timeout to '10s'
It is very rare that I need more and quite often it saves me from a dead loop.
When I write a recursive query from scratch I always use an additional column that limits the number of levels involved, something like this:
with recursive cte (root, parent, depth) as (
select id, parent_id, 1
from ...
union all
select c.id, t.parent_id, depth+ 1
from ...
where depth < 10
)
select *
from cte;
In production both these ways may be problematic. Instead, you can adjust the value of the configuration parameter max_stack_depth (integer) to the anticipated needs and capabilities of the operating system and/or hardware.
See also this answer for an alternative approach and example of the new feature in Postgres 14+.
Upvotes: 29