Reputation: 5637
I have the following recursive function in PostgreSQL. but I wonder how it works
Can anyone help me explain this?
CREATE OR REPLACE FUNCTION build_hierarchey(location_id int) RETURNS SETOF jsonb
AS $BODY$
BEGIN
RETURN QUERY
SELECT
CASE WHEN COUNT(x) > 0
THEN ((to_jsonb(t) || jsonb_build_object('Children', jsonb_agg(f.x)))
ELSE to_jsonb(t)
END
FROM "Locations" t
LEFT JOIN build_hierarchey(t."Id") AS f(x) ON true
WHERE t."ParentLocationId" = location_id OR (location_id IS null AND t."ParentLocationId" IS null)
GROUP BY t."Id", t."Name";
END;
$BODY$ LANGUAGE 'plpgsql'
this is how I use it.
select jsonb_agg(build_hierarchey) from build_hierarchey(null::int)
Result in json (hirarchey)
[
{
"Id": 11,
"Name": "Zone-C",
"Children": [
{
"Id": 23,
"Name": "01-C",
"CategoryId": null
},
{
"Id": 20,
"Name": "01-A",
"CategoryId": null
}
],
"CategoryId": null
},
{
"Id": 19,
"Name": "Zone-K",
"CategoryId": null
},
{
"Id": 1,
"Name": "ccc",
"Children": [
{
"Id": 3,
"Name": "01-A",
"CategoryId": null
},
{
"Id": 5,
"Name": "01-C",
"Children": [
{
"Id": 8,
"Name": "01-C-03",
"CategoryId": null
},
{
"Id": 7,
"Name": "01-C-02",
"CategoryId": null
},
{
"Id": 6,
"Name": "01-C-01",
"CategoryId": null
}
],
"CategoryId": null
},
{
"Id": 4,
"Name": "01-B",
"CategoryId": null
}
],
"CategoryId": null
},
{
"Id": 18,
"Name": "Zone-J",
"CategoryId": null
},
{
"Id": 2,
"Name": "Zone-B",
"Children": [
{
"Id": 10,
"Name": "02-A",
"CategoryId": null
},
{
"Id": 9,
"Name": "01-A",
"CategoryId": null
}
],
"CategoryId": null
},
{
"Id": 16,
"Name": "Zone-H",
"CategoryId": null
},
{
"Id": 15,
"Name": "Zone-G",
"CategoryId": null
},
{
"Id": 14,
"Name": "Zone-F",
"CategoryId": null
},
{
"Id": 17,
"Name": "Zone-I",
"CategoryId": null
},
{
"Id": 22,
"Name": "Zone-AA",
"CategoryId": null
}
]
Here is the "Locations" table. (self-referencing)
SELECT "Id", "Name", "ParentLocationId" FROM "Locations"
ID Name ParentLocationId
1 "ccc"
2 "Zone-B"
3 "01-A" 1
4 "01-B" 1
5 "01-C" 1
6 "01-C-01" 5
7 "01-C-02" 5
8 "01-C-03" 5
9 "01-A" 2
10 "02-A" 2
11 "Zone-C"
14 "Zone-F"
15 "Zone-G"
Upvotes: 3
Views: 1255
Reputation: 97908
As with many recursive functions, we can break this down conceptually into several steps.
First, we have the base case - when given null
as input, the query selects all locations
with no parent:
SELECT
...
FROM "Locations" t
WHERE location_id IS null AND t."ParentLocationId" IS null
Next, we have the repeated case - when given an ID as input, the query selects all locations
which have that ID as their parent:
SELECT
...
FROM "Locations" t
WHERE t."ParentLocationId" = location_id
Then we add the recursion - when returning either of the above lists, also call the function again, with each ID found:
SELECT
...
FROM "Locations" t
LEFT JOIN build_hierarchey(t."Id") AS f(x) ON true
The ON true
clause is there because there has to be some condition, but the actual rows to join are determined by the t."Id"
passed into the function. It needs to be a LEFT JOIN
, because the function will sometimes generate no rows; e.g. with your sample data build_hierarchey(3)
will return no rows, but we don't want that to exclude location 3 from the results of build_hierarchey(1)
.
Now we need to combine the results of the current query (t
) with the results of the recursive function call (f
):
SELECT
CASE WHEN COUNT(x) > 0
THEN ((to_jsonb(t) || jsonb_build_object('Children', jsonb_agg(f.x)))
ELSE to_jsonb(t)
END
FROM ... as t
LEFT JOIN ... as f(x)
GROUP BY t."Id", t."Name";
Again, this has to handle the case where the function returned no rows. The rest is just formatting: flattening each set of results into a JSON object and combining them in the desired structure.
Now, to your questions:
the sequence of execution (it goes deep hierarchy by each root node first or list all root node first then go to next level.)
In SQL, it's rarely useful to talk about order of execution, because a DBMS doesn't generate one row at a time and spit out the result, it tries to generate the requested data as efficiently as possible. As far as I know, the DBMS is free to choose whether all the rows at one level are loaded into memory, and the function run on each; or whether the function is run depth-first as each row is encountered.
Generally speaking, the only thing that is guaranteed is that the results will come out in the order requested by the query. In this case, there is no ORDER BY
clause, so the actual order of results at each level is also completely undefined - the DBMS will pick whichever order is most convenient when running the query.
how it breaks from this function.
This function is not actually guaranteed to exit; its exit condition relies on the assumption that your data contains no cycles - that is, that it will eventually reach a node with no "children" (i.e., an ID that never appears in the ParentLocationId
column).
To demonstrate a cycle, consider this data:
ID Name ParentLocationId
1 "A" 3
2 "B" 1
3 "C" 2
A call to build_hierarchey(1)
will proceed as follows:
This is probably prevented elsewhere in the application, but we also have one extra protection: by calling build_hierarchey(null)
we can never encounter such a cycle. We can prove this as follows:
null
as their parent are not part of a cyclenull
, because that parent could not be part of the cyclenull
(which is what build_hierarchey(null)
does) will never reach a cycle, even if one exists in the tableUpvotes: 3