Reputation: 879
So I'm completely stumped trying to figure out this SQL query.
I have table of product categories that exist in a tree-like structure. To simplify let's say there are 3 top level categories: A, B, C. There is one more category above them ('All') which is the root. No products can be assigned to this category. To distinguish categories that can't be assigned to products, they have a type 'Abstract', as opposed to 'Concrete'
Each category can have any number and depth of sub-categories. I'm currently storing these with a parent id to the immediate parent (adjacency list).
Categories
Category Parent Type
All None Abstract
A All Concrete
B All Concrete
C All Concrete
D A Concrete
E D Concrete
F B Concrete
G F Concrete
H C Concrete
I C Concrete
I have another table of products with a category field. The only categories that appear in this table are the top level ones. ie. Either A, B, or C.
Products
Part Number Category
XXXX-XXXX A
XXXX-YYYY A
XXXX-ZZZZ B
YYYY-XXXX C
I would like to create a query that joins the two tables, to create rows where the Category is replaced with the child category. ie. From a pseudo code standpoint basically join on category = provided category is either equal or a descendent of category.
So something like:
select * from products
inner join categories
on products.category = descendent of category
would result in:
Part Number Category
XXXX-XXXX E (E's top level concrete parent is A)
XXXX-YYYY E (E's top level concrete parent is A)
YYYY-XXXX H (H's top level concrete parent is C)
YYYY-XXXX I (I's top level concrete parent is C)
I have this that retrieves all the concrete types up to the top level:
with recursive
concrete_parents as (
select category, parent, type
from categories
where category in ('E', 'H', 'I')
UNION ALL
select t2.category, t2.parent, t2.type
from categories as t2
inner join concrete_parents t1
on t1.parent = t2.category
where t2.type = 'Concrete'
)
select distinct * from concrete_parents
order by parent;
I can't figure out how to combine this with a inner join on the main table?
Another alternative I'm considering is using a Postgres ltree but I'm not very familiar with it.
Any thoughts?
Upvotes: 1
Views: 265
Reputation: 658482
... would be great to dynamically capture the top level concrete categories.
That seems doable since you stated:
The only categories that appear in this table (
Products
) are the top level ones, ie. A, B, or C.
So those top level categories are filtered in the final JOIN
automatically. And since those (and only those) have parent = 'All'
according to your sample data, we can shave off one level of recursion and make it a bit faster, yet:
WITH RECURSIVE parent_cat AS (
SELECT category AS original, category, parent -- no need for type
FROM categories c
WHERE category in ('A', 'D', 'H', 'I')
UNION ALL
SELECT pc.original, c.category, c.parent
FROM parent_cat pc
JOIN categories c ON c.category = pc.parent
WHERE pc.parent <> 'All' -- stop at top level, save 1 recursion
)
SELECT p.part_number, pc.category, pc.original
FROM parent_cat pc
JOIN products p USING (category)
WHERE pc.parent = 'All' -- redundant, but a bit faster
ORDER BY pc.original;
Also, no need to filter with type = 'Concrete'
, as other types are filtered by the join as well as by pc.parent = 'All'
already.
db<>fiddle here
BTW, if performance is critical and categories don't change too much, consider a MATERIALIZED VIEW
replacing the rCTE parent_cat
in the query - and implement an appropriate regime to keep it up to date.
Upvotes: 1
Reputation: 879
So I believe this is working:
WITH RECURSIVE parent_categories AS (
SELECT category, parent, type, category AS original
FROM categories
WHERE category in ('E', 'H', 'I')
UNION ALL
SELECT cat.category, cat.parent, cat.type, pc.original
FROM categories cat, parent_categories pc
WHERE cat.category = pc.parent
)
SELECT b.part_number, a.category, a.original
FROM parent_categories a
INNER JOIN products b
ON a.category = b.category
WHERE a.type = 'Concrete' AND a.category IN ('A', 'B', 'C')
I don't love it as would be great to dynamically capture the top level concrete categories. Although in this system they are very stable. If I replace ('E', 'H', 'I') with ('D', 'H', 'I') I get:
part_number category original
XXXX-XXXX A D
XXXX-YYYY A D
YYYY-XXXX C H
YYYY-XXXX C I
Or ('A', 'D', 'H', 'I') I get:
part_number category original
XXXX-XXXX A A
XXXX-YYYY A A
XXXX-XXXX A D
XXXX-YYYY A D
YYYY-XXXX C H
YYYY-XXXX C I
I haven't tested rigorously, but it does seem to give the results I would want.
Open to whether there is a more elegant solution that doesn't require hard coding the top concrete categories in the query.
Upvotes: 0