Reputation: 4203
Consider a blog application which has tables for posts, categories and a lookup table linking a post with one or more categories. Categories are hierarchical. Posts can be assigned to any category, not just leaf nodes.
The categories table has a post_count
field which caches the number of posts assigned to the specific category. It also has parent_id
, lft
and rght
columns for MPTT.
But it also has a under_post_count
field which caches the number of distinct posts assigned to it or any of it's child categories. This is useful so you can display a hierarchical list of categories with the number of posts assigned to it, or one of its children, next to it.
My application has got to the point where after a post is created with categories, or it's categories are edited or one which had categories is deleted, I have a list of category IDs of the old and new categories, whose post_count
field needs updating. What I was hoping I could do next is in a single query, update the under_post_count
fields for all those categories identified, and all their parents, with the number of distinct posts assigned to each category or any of it's children.
Here's the SQL required to create the tables and some test data for categories:
CREATE TABLE `categories` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`parent_id` int(11) DEFAULT NULL,
`lft` int(11) DEFAULT NULL,
`rght` int(11) DEFAULT NULL,
`name` varchar(255) NOT NULL,
`post_count` int(11) NOT NULL DEFAULT '0',
`under_post_count` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=MyISAM;
CREATE TABLE `categories_posts` (
`category_id` int(11) NOT NULL,
`post_id` int(11) NOT NULL,
PRIMARY KEY (`category_id`,`post_id`)
) ENGINE=MyISAM;
INSERT INTO `categories` (`id`, `parent_id`, `lft`, `rght`, `name`) VALUES
(1, NULL, 1, 8, 'Cat 1'),
(4, 1, 2, 3, 'Cat 1.1'),
(5, 1, 4, 5, 'Cat 1.2'),
(6, 1, 6, 7, 'Cat 1.3'),
(2, NULL, 9, 16, 'Cat 2'),
(7, 2, 10, 11, 'Cat 2.1'),
(8, 2, 12, 13, 'Cat 2.2'),
(9, 2, 14, 15, 'Cat 2.3'),
(3, NULL, 17, 24, 'Cat 3'),
(10, 3, 18, 19, 'Cat 3.1'),
(11, 3, 20, 21, 'Cat 3.2'),
(12, 3, 22, 23, 'Cat 3.3');
Run this a few times to create some test data for the categories_posts
table:
INSERT IGNORE INTO `categories_posts` (`category_id`, `post_id`)
SELECT `id`, CEILING(10 * RAND()) FROM `categories` ORDER BY RAND() LIMIT 6
Can anyone figure this out, your help would be much appreciated?
Upvotes: 1
Views: 518
Reputation: 12704
well there are a few ways to skin a cat here (assuming 5.1 and triggers)
you can update everything from application layer
you can trigger updates to post_count
from categories_posts
and trigger updates (cascade) to under_post_count
from categories
finally, you can trigger all updates from categories_posts
Also depending on the actual number of categories, you might not need to denormalize under_post_count
as it should be rather easy and inexpensive to fetch it with
SELECT c.id, SUM(cc.post_count)
FROM categories c
LEFT JOIN categories cc ON c.lft <= cc.lft AND c.rght >= cc.rght
GROUP BY c.id;
Fetching the actual counts on exact match is
SELECT c.id, COUNT(*)
FROM categories c
LEFT JOIN categories_posts cp ON c.id = cp.post_id
GROUP BY c.id;
Combining the two gives the counts including the hierarchies
SELECT c.id, COUNT(*)
FROM categories c
LEFT JOIN categories cc ON c.lft <= cc.lft AND c.rght >= cc.rght
LEFT JOIN categories_posts cp ON cc.id = cp.post_id
GROUP BY c.id;
EDIT
Constructing update statements from the above should not be so hard
UPDATE categories
SET post_count = (SELECT COUNT(*)
FROM categories_posts cp
WHERE cp.post_id = categories.id)
should work for the post_count
Situation for the under_post_count
is different since mysql does not like to hear that target table is mentioned in the where part, therefore you have to do some monstrosity like this
UPDATE categories LEFT JOIN
(SELECT c.id, COUNT(*) AS result
FROM categories c
LEFT JOIN categories cc ON c.lft <= cc.lft AND c.rght >= cc.rght
INNER JOIN categories_posts cp ON cc.id = cp.post_id
GROUP BY c.id) AS x ON categories.id = x.id
SET under_post_count = x.result
EDIT2
Actually there is an error in all of the above queries - whenever I joined categories and posts I should have joined on cc.id = cp.category_id
and not cp.post_id
, which then I didn't check. Don't feel like correcting... but only in this last query
UPDATE categories LEFT JOIN
(SELECT c.id, COUNT(*) AS result
FROM categories c
LEFT JOIN categories cc ON c.lft <= cc.lft AND c.rght >= cc.rght
INNER JOIN categories_posts cp ON cc.id = cp.category_id
INNER JOIN posts p ON cp.post_id = p.id
WHERE p.status = 'published'
GROUP BY c.id) AS x ON categories.id = x.id
SET under_post_count = x.result,
post_count = (SELECT COUNT(*)
FROM categories_posts cp
WHERE cp.category_id = categories.id)
EDIT3
Just few notes:
under_post_count
and post_count
regardless of the state of the data,post_count = post_count +/- 1
on appropriate records in statuses (similary for under_post_count
),Upvotes: 3