irrational
irrational

Reputation: 799

Hierarchical parent query in postgres

I am moving from Oracle to Postgresql. I am trying to convert some Oracle hierarchical queries to Postgres. For example, in Oracle to return a comma-delimited ordered list of all ids under (i.e., the children) and including the id_to_start_with I would do the following:

SELECT LISTAGG(id_something, ',') WITHIN GROUP (ORDER BY id_something) AS somethings FROM(
SELECT DISTINCT D.id_something
FROM something_table D
START WITH D.id_something = :id_to_start_with
CONNECT BY D.id_something_parent = PRIOR D.id_something
)

The equivalent in Postgres would seem to be:

WITH RECURSIVE the_somethings(id_something) AS (
SELECT id_something
FROM something_table
WHERE id_something = $id_to_start_with
UNION ALL
SELECT D.id_something
FROM the_somethings DR
JOIN something_table D ON DR.id_something = D.id_something_parent
)
SELECT string_agg(temp_somethings.id_something::TEXT, ',') AS somethings
FROM (
SELECT id_something
FROM the_somethings
ORDER BY id_something
) AS temp_somethings

Likewise if I want to return a comma-delimited ordered list of all ids above (i.e., the parents) and including the id_to_start_with I would do the following in Oracle:

SELECT LISTAGG(id_something, ',') WITHIN GROUP (ORDER BY id_something) AS somethings FROM(
SELECT DISTINCT D.id_something
FROM something_table D
START WITH D.id_something = :id_to_start_with
CONNECT BY D.id_something = PRIOR D.id_something_parent
)

The equivalent in Postgres would seem to be:

WITH RECURSIVE the_somethings(id_something, path) AS (
SELECT id_something
, id_something::TEXT as path
FROM something_table
WHERE id_something_parent IS NULL
UNION ALL
SELECT D.id_something
, (DR.path || ',' || D.id_something::TEXT) as path
FROM something_table D
JOIN the_somethings DR ON DR.id_something = D.id_something_parent
)
SELECT path
FROM the_somethings
WHERE id_something = $id_to_start_with
ORDER BY id_something

My question has to do with the last Postgres query. It seems terribly inefficient to me and I wonder if there is a better way to write it. That is, in Oracle the query will look for the parent of the id_to_start_with, then the parent of the parent, and so forth to the root.

The Postgres query, on the other hand, gets every single root to child path combination possible and then throws everything away except for the one root to id_to_start_with that I am looking for. That is potentially a ton of data to create just to throw it all away except for the one row I am looking for.

Is there a way to get a comma-delimited ordered list of all the parents of a particular id_to_start_with that is as performant in Postgres as it is in Oracle?

Edit: Adding explain plans from Oracle and Postgres.

Oracle Explain Plan Output Oracle Plan ScreenShot

Postgres Explain Analyyze Output

CTE Scan on the_somethings  (cost=62.27..74.66 rows=3 width=76) (actual time=0.361..0.572 rows=1 loops=1)
  Filter: (id_something = 1047)
  Rows Removed by Filter: 82
  CTE the_somethings
    ->  Recursive Union  (cost=0.00..62.27 rows=551 width=76) (actual time=0.026..0.433 rows=83 loops=1)
          ->  Seq Scan on something_table  (cost=0.00..2.83 rows=1 width=8) (actual time=0.023..0.034 rows=1 loops=1)
                Filter: (id_something_parent IS NULL)
                Rows Removed by Filter: 82
          ->  Hash Join  (cost=0.33..4.84 rows=55 width=76) (actual time=0.028..0.065 rows=16 loops=5)
                Hash Cond: (d.id_something_parent = dr.id_something)
                ->  Seq Scan on something_table d  (cost=0.00..2.83 rows=83 width=16) (actual time=0.002..0.012 rows=83 loops=5)
                ->  Hash  (cost=0.20..0.20 rows=10 width=76) (actual time=0.009..0.009 rows=17 loops=5)
                      Buckets: 1024  Batches: 1  Memory Usage: 10kB
                      ->  WorkTable Scan on the_somethings dr  (cost=0.00..0.20 rows=10 width=76) (actual time=0.001..0.004 rows=17 loops=5)
Planning time: 0.407 ms
Execution time: 0.652 ms

This is the final query based on Jakub's answer below.

WITH RECURSIVE the_somethings(id_something, path, level, orig_id, id_something_parent) AS (

SELECT id_something

, id_something::TEXT as path

, 0 as level

, id_something AS orig_id

, id_something_parent

FROM something_table

WHERE id_something IN (1047, 448)

UNION ALL

SELECT D.id_something

, (D.id_something::TEXT || ',' || DR.path) as path

, DR.level + 1 as level

, DR.orig_id as orig_id

, D.id_something_parent

FROM something_table D

JOIN the_somethings DR ON D.id_something = DR.id_something_parent

)

SELECT DISTINCT ON(orig_id) orig_id, path

FROM the_somethings

ORDER BY orig_id, level DESC
;

Upvotes: 4

Views: 1560

Answers (1)

Jakub Kania
Jakub Kania

Reputation: 16487

CTEs in PostgreSQL are fenced meaning they will be materialized and only then will the filter from outer query will be applied. To make the query perform correctly build it the other way around and put the filter inside the CTE.

WITH RECURSIVE the_somethings(id_something, path) AS (
SELECT id_something
, id_something::TEXT as path, 0 as level, id_something AS orig_id
FROM something_table
WHERE id_something IN ($id_to_start_with,$id_to_start_with2)
UNION ALL
SELECT D.id_something
, (D.id_something::TEXT || ',' || DR.path) as path, DR.level + 1, DR.orig_id
FROM something_table D
JOIN the_somethings DR ON DR.id_something_parent = D.id_something
)
SELECT DISTINCT ON(orig_id) orig_id, path
FROM the_somethings
ORDER BY orig_id, DR.level DESC

Upvotes: 1

Related Questions