Jongz Puangput
Jongz Puangput

Reputation: 5637

Postgresql recursive function

I have the following recursive function in PostgreSQL. but I wonder how it works

  1. the sequence of execution (it goes deep hierarchy by each root node first or list all root node first then go to next level.)
  2. how it breaks from this function.

Can anyone help me explain this?

CREATE OR REPLACE FUNCTION build_hierarchey(location_id int) RETURNS SETOF jsonb 
AS $BODY$
    BEGIN
        RETURN QUERY 
        SELECT
            CASE WHEN COUNT(x) > 0 
                THEN ((to_jsonb(t) || jsonb_build_object('Children', jsonb_agg(f.x)))
                ELSE to_jsonb(t)
            END
        FROM "Locations" t
        LEFT JOIN build_hierarchey(t."Id") AS f(x) ON true
        WHERE t."ParentLocationId" = location_id OR (location_id IS null AND t."ParentLocationId" IS null)
        GROUP BY t."Id", t."Name";                                                       
    END;
$BODY$ LANGUAGE 'plpgsql'

this is how I use it.

select jsonb_agg(build_hierarchey) from build_hierarchey(null::int)

Result in json (hirarchey)

[
  {
    "Id": 11,
    "Name": "Zone-C",
    "Children": [
      {
        "Id": 23,
        "Name": "01-C",
        "CategoryId": null
      },
      {
        "Id": 20,
        "Name": "01-A",
        "CategoryId": null
      }
    ],
    "CategoryId": null
  },
  {
    "Id": 19,
    "Name": "Zone-K",
    "CategoryId": null
  },
  {
    "Id": 1,
    "Name": "ccc",
    "Children": [
      {
        "Id": 3,
        "Name": "01-A",
        "CategoryId": null
      },
      {
        "Id": 5,
        "Name": "01-C",
        "Children": [
          {
            "Id": 8,
            "Name": "01-C-03",
            "CategoryId": null
          },
          {
            "Id": 7,
            "Name": "01-C-02",
            "CategoryId": null
          },
          {
            "Id": 6,
            "Name": "01-C-01",
            "CategoryId": null
          }
        ],
        "CategoryId": null
      },
      {
        "Id": 4,
        "Name": "01-B",
        "CategoryId": null
      }
    ],
    "CategoryId": null
  },
  {
    "Id": 18,
    "Name": "Zone-J",
    "CategoryId": null
  },
  {
    "Id": 2,
    "Name": "Zone-B",
    "Children": [
      {
        "Id": 10,
        "Name": "02-A",
        "CategoryId": null
      },
      {
        "Id": 9,
        "Name": "01-A",
        "CategoryId": null
      }
    ],
    "CategoryId": null
  },
  {
    "Id": 16,
    "Name": "Zone-H",
    "CategoryId": null
  },
  {
    "Id": 15,
    "Name": "Zone-G",
    "CategoryId": null
  },
  {
    "Id": 14,
    "Name": "Zone-F",
    "CategoryId": null
  },
  {
    "Id": 17,
    "Name": "Zone-I",
    "CategoryId": null
  },
  {
    "Id": 22,
    "Name": "Zone-AA",
    "CategoryId": null
  }
]

Here is the "Locations" table. (self-referencing)

SELECT "Id", "Name", "ParentLocationId" FROM "Locations"

ID  Name      ParentLocationId
1   "ccc"       
2   "Zone-B"    
3   "01-A"      1
4   "01-B"      1
5   "01-C"      1
6   "01-C-01"   5
7   "01-C-02"   5
8   "01-C-03"   5
9   "01-A"      2
10  "02-A"      2
11  "Zone-C"    
14  "Zone-F"    
15  "Zone-G"    

Upvotes: 3

Views: 1255

Answers (1)

IMSoP
IMSoP

Reputation: 97908

As with many recursive functions, we can break this down conceptually into several steps.

First, we have the base case - when given null as input, the query selects all locations with no parent:

SELECT
    ...
FROM "Locations" t
WHERE location_id IS null AND t."ParentLocationId" IS null

Next, we have the repeated case - when given an ID as input, the query selects all locations which have that ID as their parent:

SELECT
    ...
FROM "Locations" t
WHERE t."ParentLocationId" = location_id

Then we add the recursion - when returning either of the above lists, also call the function again, with each ID found:

SELECT
    ...
FROM "Locations" t
LEFT JOIN build_hierarchey(t."Id") AS f(x) ON true

The ON true clause is there because there has to be some condition, but the actual rows to join are determined by the t."Id" passed into the function. It needs to be a LEFT JOIN, because the function will sometimes generate no rows; e.g. with your sample data build_hierarchey(3) will return no rows, but we don't want that to exclude location 3 from the results of build_hierarchey(1).

Now we need to combine the results of the current query (t) with the results of the recursive function call (f):

SELECT
    CASE WHEN COUNT(x) > 0 
        THEN ((to_jsonb(t) || jsonb_build_object('Children', jsonb_agg(f.x)))
        ELSE to_jsonb(t)
    END
FROM ... as t
LEFT JOIN ... as f(x)
GROUP BY t."Id", t."Name";

Again, this has to handle the case where the function returned no rows. The rest is just formatting: flattening each set of results into a JSON object and combining them in the desired structure.

Now, to your questions:

the sequence of execution (it goes deep hierarchy by each root node first or list all root node first then go to next level.)

In SQL, it's rarely useful to talk about order of execution, because a DBMS doesn't generate one row at a time and spit out the result, it tries to generate the requested data as efficiently as possible. As far as I know, the DBMS is free to choose whether all the rows at one level are loaded into memory, and the function run on each; or whether the function is run depth-first as each row is encountered.

Generally speaking, the only thing that is guaranteed is that the results will come out in the order requested by the query. In this case, there is no ORDER BY clause, so the actual order of results at each level is also completely undefined - the DBMS will pick whichever order is most convenient when running the query.

how it breaks from this function.

This function is not actually guaranteed to exit; its exit condition relies on the assumption that your data contains no cycles - that is, that it will eventually reach a node with no "children" (i.e., an ID that never appears in the ParentLocationId column).

To demonstrate a cycle, consider this data:

ID  Name      ParentLocationId
1   "A"       3
2   "B"       1
3   "C"       2

A call to build_hierarchey(1) will proceed as follows:

  • Fetch nodes with parent 1: found ID=2
  • Fetch nodes with parent 2: found ID=3
  • Fetch nodes with parent 3: found ID=1
  • Fetch nodes with parent 1: infinite loop

This is probably prevented elsewhere in the application, but we also have one extra protection: by calling build_hierarchey(null) we can never encounter such a cycle. We can prove this as follows:

  • Every node has either one parent or none
  • A cycle consists of following parents and reaching the same node
  • For a node to form part of a cycle, it must have a parent
  • Therefore the nodes with null as their parent are not part of a cycle
  • If a node is in a cycle, its parent must be in that same cycle (cycle A-B-C-A can be extended to A-B-C-A-B, and so on)
  • Therefore, a node in a cycle cannot have a parent whose parent is null, because that parent could not be part of the cycle
  • Therefore, starting with a node whose parent is null (which is what build_hierarchey(null) does) will never reach a cycle, even if one exists in the table

Upvotes: 3

Related Questions