Steven
Steven

Reputation: 879

SQL: Join (Inner Join) with matching ancestor from another table

So I'm completely stumped trying to figure out this SQL query.

I have table of product categories that exist in a tree-like structure. To simplify let's say there are 3 top level categories: A, B, C. There is one more category above them ('All') which is the root. No products can be assigned to this category. To distinguish categories that can't be assigned to products, they have a type 'Abstract', as opposed to 'Concrete'

Each category can have any number and depth of sub-categories. I'm currently storing these with a parent id to the immediate parent (adjacency list).

Categories

Category   Parent    Type
All        None      Abstract
A          All       Concrete
B          All       Concrete
C          All       Concrete
D          A         Concrete
E          D         Concrete
F          B         Concrete
G          F         Concrete
H          C         Concrete
I          C         Concrete

I have another table of products with a category field. The only categories that appear in this table are the top level ones. ie. Either A, B, or C.

Products

Part Number       Category
XXXX-XXXX         A
XXXX-YYYY         A
XXXX-ZZZZ         B
YYYY-XXXX         C

I would like to create a query that joins the two tables, to create rows where the Category is replaced with the child category. ie. From a pseudo code standpoint basically join on category = provided category is either equal or a descendent of category.

So something like:

select * from products
inner join categories
on products.category = descendent of category

would result in:

Part Number       Category
XXXX-XXXX         E (E's top level concrete parent is A)
XXXX-YYYY         E (E's top level concrete parent is A)
YYYY-XXXX         H (H's top level concrete parent is C)
YYYY-XXXX         I (I's top level concrete parent is C)

I have this that retrieves all the concrete types up to the top level:

with recursive
concrete_parents as (
  select category, parent, type
  from categories
  where category in ('E', 'H', 'I')
  UNION ALL
    select t2.category, t2.parent, t2.type
    from categories as t2
    inner join concrete_parents t1
    on t1.parent = t2.category
    where t2.type = 'Concrete'
)

select distinct * from concrete_parents
order by parent;

I can't figure out how to combine this with a inner join on the main table?

Another alternative I'm considering is using a Postgres ltree but I'm not very familiar with it.

Any thoughts?

Upvotes: 1

Views: 265

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 658482

... would be great to dynamically capture the top level concrete categories.

That seems doable since you stated:

The only categories that appear in this table (Products) are the top level ones, ie. A, B, or C.

So those top level categories are filtered in the final JOIN automatically. And since those (and only those) have parent = 'All' according to your sample data, we can shave off one level of recursion and make it a bit faster, yet:

WITH RECURSIVE parent_cat AS (
   SELECT category AS original, category, parent -- no need for type
   FROM   categories      c
   WHERE  category in ('A', 'D', 'H', 'I')

   UNION ALL
   SELECT pc.original, c.category, c.parent
   FROM   parent_cat pc
   JOIN   categories c ON c.category = pc.parent
   WHERE  pc.parent <> 'All'  -- stop at top level, save 1 recursion
   )
SELECT p.part_number, pc.category, pc.original 
FROM   parent_cat pc
JOIN   products   p USING (category)
WHERE  pc.parent = 'All'      -- redundant, but a bit faster
ORDER  BY pc.original;

Also, no need to filter with type = 'Concrete', as other types are filtered by the join as well as by pc.parent = 'All' already.

db<>fiddle here

BTW, if performance is critical and categories don't change too much, consider a MATERIALIZED VIEW replacing the rCTE parent_cat in the query - and implement an appropriate regime to keep it up to date.

Upvotes: 1

Steven
Steven

Reputation: 879

So I believe this is working:

WITH RECURSIVE parent_categories AS (
SELECT category, parent, type, category AS original
FROM categories
WHERE category in ('E', 'H', 'I')

UNION ALL

SELECT cat.category, cat.parent, cat.type, pc.original
FROM categories cat, parent_categories pc
WHERE cat.category = pc.parent
)
SELECT b.part_number, a.category, a.original 
FROM parent_categories a
INNER JOIN products b
ON a.category = b.category
WHERE a.type = 'Concrete' AND a.category IN ('A', 'B', 'C')

Fiddle

I don't love it as would be great to dynamically capture the top level concrete categories. Although in this system they are very stable. If I replace ('E', 'H', 'I') with ('D', 'H', 'I') I get:

part_number category    original
XXXX-XXXX   A           D
XXXX-YYYY   A           D
YYYY-XXXX   C           H
YYYY-XXXX   C           I

Or ('A', 'D', 'H', 'I') I get:

part_number category    original
XXXX-XXXX   A           A
XXXX-YYYY   A           A
XXXX-XXXX   A           D
XXXX-YYYY   A           D
YYYY-XXXX   C           H
YYYY-XXXX   C           I 

I haven't tested rigorously, but it does seem to give the results I would want.

Open to whether there is a more elegant solution that doesn't require hard coding the top concrete categories in the query.

Upvotes: 0

Related Questions