madflow
madflow

Reputation: 8490

Prevent infinite loop in recursive query in Postgresql

Here: (To find infinite recursive loop in CTE) is a discussion how to prevent an infinite loop in a recursive query. There the recursion is prevented on the "query level" - at least in an answer about Postgresql.

Is there a way in Postgresql (10) to implement some kind of safety net to prevent infinite recursions? Is it a feasible way to use statement_timeout for this or is there any other widely accepted way?

Upvotes: 19

Views: 14698

Answers (3)

If you don't need to return the recursion depth as an output, then using UNION instead of UNION ALL is the way to go on both PostgreSQL and SQLite as mentioned at: https://stackoverflow.com/a/77165059/895245

However if you also want to track the depth of results, that won't do as the depth column makes the values become different from one another.

The solutions from the answer To find infinite recursive loop in CTE cover that as well however.

Test data

Create some demo data:

CREATE TABLE "graph" ("id" integer, "parent_id" integer);
INSERT INTO "graph" VALUES
(1, null), -- root element
(2, 1),
(3, 1),
(4, 3), 
(5, 4), 
(3, 5); -- endless loop

Graph represented:

1 --+--> 2
    |
    +--> 3 --> 4 --> 5 --+
         ^               |
         |               |
         +---------------+

PostgreSQL 14+: CYCLE keyword

WITH RECURSIVE "cte"("id", "parent_id", "depth") AS (
  SELECT "id",
         "parent_id",
         0
  FROM "graph"
  WHERE "parent_id" IS NULL

  UNION ALL

  SELECT "graph"."id",
         "graph"."parent_id",
         "cte"."depth" + 1
  FROM "graph"
    INNER JOIN "cte"
    ON "graph"."parent_id" = "cte"."id"
)
CYCLE "id" -- track cycles for this column
  SET "is_cycle" -- adds a boolean column is_cycle
  USING "path" -- adds a column that contains all parents for the id
SELECT *
FROM "cte"
WHERE NOT "is_cycle"
ORDER BY "id" ASC;

Output:

 id | parent_id | depth | is_cycle |       path        
----+-----------+-------+----------+-------------------
  1 |           |     0 | f        | {(1)}
  2 |         1 |     1 | f        | {(1),(2)}
  3 |         1 |     1 | f        | {(1),(3)}
  4 |         3 |     2 | f        | {(1),(3),(4)}
  5 |         4 |     3 | f        | {(1),(3),(4),(5)}
(5 rows)

So we understand that this method tracks the paths down to each element and uses that information to detect loops and stop before.

Detect loops instead of ignoring them

If we wanted instead to check if a loop exists in the DB we could just change:

WHERE NOT "is_cycle";

to:

WHERE "is_cycle";

which then gives us the loop path:

 id | parent_id | depth | is_cycle |         path          
----+-----------+-------+----------+-----------------------
  3 |         5 |     4 | t        | {(1),(3),(4),(5),(3)}

See also: To find infinite recursive loop in CTE

PostgreSQL pre-14: explicit array manipulation

The CYCLE keyword is a shortcut for the following which uses an explicit array to track the path and detect loops:

WITH RECURSIVE "cte"("id", "parent_id", "depth", "is_cycle", "path") AS (
  SELECT "id",
         "parent_id",
         0,
         false,
         ARRAY["id"] as "path"
  FROM "graph"
  WHERE "parent_id" IS NULL

  UNION ALL

  SELECT "graph"."id",
         "graph"."parent_id",
         "cte"."depth" + 1,
         "graph"."id" = any("path"),
         "cte"."path" || "graph"."id"
  FROM "graph"
    INNER JOIN "cte"
    ON "graph"."parent_id" = "cte"."id"
    AND NOT "is_cycle"
)
SELECT *
FROM "cte"
WHERE NOT "is_cycle"
ORDER BY "id" ASC;

Output:

 id | parent_id | depth | is_cycle |   path    
----+-----------+-------+----------+-----------
  1 |           |     0 | f        | {1}
  2 |         1 |     1 | f        | {1,2}
  3 |         1 |     1 | f        | {1,3}
  4 |         3 |     2 | f        | {1,3,4}
  5 |         4 |     3 | f        | {1,3,4,5}
(5 rows)

Using UNION instead of UNION ALL when depth is not needed

Just to give an example of that method:

WITH RECURSIVE "cte"("id", "parent_id") AS (
  SELECT "id", "parent_id"
  FROM "graph"
  WHERE "parent_id" IS NULL

  UNION

  SELECT "graph"."id",
         "graph"."parent_id"
  FROM "graph"
    INNER JOIN "cte"
    ON "graph"."parent_id" = "cte"."id"
)
SELECT *
FROM "cte"
ORDER BY "id" ASC;

Output:

 id | parent_id 
----+-----------
  1 |          
  2 |         1
  3 |         1
  3 |         5
  4 |         3
  5 |         4
(6 rows)

That exact same query also works on SQLite which doesn't have neither ARRAY nor CYCLE, output:

1|
2|1
3|1
3|5
4|3
5|4

Tested on PostgreSQL 16.4, SQLite 3.45.1, Ubuntu 24.04.

Upvotes: 0

mk73
mk73

Reputation: 31

For parent-child relationships, simply replacing UNION ALL with UNION prevents an infinite loop in recursive queries if you are interested in parent and child IDs only. I doubt that UNION ALL is very useful in recursive queries at all. Here is algorithm described on PostgreSQL site:

  1. Evaluate the non-recursive term. For UNION (but not UNION ALL), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.

  2. So long as the working table is not empty, repeat these steps:

    • (a) Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For UNION (but not UNION ALL), discard duplicate rows and rows that duplicate any previous result row. Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.

    • (b) Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.

Here 2.a ensures that duplicates are discarded while evaluating the recursive term, so the infinite loop breaks at this point.

Upvotes: 2

klin
klin

Reputation: 121594

In my development environment, I always use two fuses for recursive queries or functions. My client automatically sets on startup

set statement_timeout to '10s'

It is very rare that I need more and quite often it saves me from a dead loop.

When I write a recursive query from scratch I always use an additional column that limits the number of levels involved, something like this:

with recursive cte (root, parent, depth) as (
    select id, parent_id, 1
    from ...
union all
    select c.id, t.parent_id, depth+ 1
    from ...
    where depth < 10
)
select * 
from cte;

In production both these ways may be problematic. Instead, you can adjust the value of the configuration parameter max_stack_depth (integer) to the anticipated needs and capabilities of the operating system and/or hardware.

See also this answer for an alternative approach and example of the new feature in Postgres 14+.

Upvotes: 29

Related Questions