Reputation: 1604
I'm working on an ECommerce website, in which there are 2 database tables in MySQL, one is products and the other one is taxonomies, products and taxonomies are many to many relationship, and taxonomies have a tree structure, meaning there's a parent_id field in taxonomies table to identify the parent id of a taxonomy.
When user selects one taxonomy, I want to get all the products that belong to this taxonomy and all its offspring taxonomies, I did this by first finding out all the offspring taxonomies of the selected taxonomy, then get paginated products result from there, but in my site there are in total 5000 taxonomies, and my solution makes the site slow like a dog...... Any advice on how I could achieve this for the sake of performance?
products table:
+-------------------+----------------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+----------------------+------+-----+---------------------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| code | bigint(20) | NO | UNI | NULL | |
| SKU | varchar(255) | NO | | NULL | |
| name | varchar(100) | NO | | NULL | |
| description | varchar(2000) | NO | | NULL | |
| short_description | varchar(200) | NO | | NULL | |
| price | decimal(8,2) | NO | | 0.00 | |
| discounted_price | decimal(8,2) | NO | | 0.00 | |
| stock | smallint(5) unsigned | NO | | 0 | |
| sales | smallint(5) unsigned | NO | | 0 | |
| num_reviews | smallint(6) | NO | | 0 | |
| weight | decimal(5,2) | NO | | 0.00 | |
| overall_rating | decimal(3,2) | NO | | 5.00 | |
| activity_id | int(10) unsigned | YES | MUL | NULL | |
| created_at | timestamp | NO | | 0000-00-00 00:00:00 | |
| updated_at | timestamp | NO | | 0000-00-00 00:00:00 | |
+-------------------+----------------------+------+-----+---------------------+----------------+
taxonomies table:
+--------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(100) | YES | UNI | NULL | |
| parent_id | int(10) unsigned | YES | MUL | NULL | |
| num_products | smallint(6) | NO | | 0 | |
+--------------+------------------+------+-----+---------+----------------+
product_taxonomy table:
+-------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| product_id | int(10) unsigned | NO | MUL | NULL | |
| taxonomy_id | int(10) unsigned | NO | MUL | NULL | |
+-------------+------------------+------+-----+---------+----------------+
Upvotes: 0
Views: 153
Reputation: 845
In case depth of single level one can use the following query
SELECT * FROM `product_taxonomy`
INNER JOIN (SELECT * FROM `taxonomies` WHERE `id` = 100 OR `parent_id` = 100) `taxonomies`
ON `product_taxonomy`.`taxonomy_id` = `taxonomies`.`id`
LEFT JOIN `products` ON `product_taxonomy`.`product_id` = `products`.`id`
You can add limit, offset to the above query for pagination.
100 in the above query represents the taxonomy id requested by the user.
Apart from this I would suggest :-
1) id
in your product table to renamed if possible to product_id
as referenced in your product_taxonomy and I presume in other tables, similarly taxonomy_id
.
This way when you join query column name would be the same.
2) I hope product_taxonomy
.product_id
, product_taxonomy
.taxonomy_id
are indexed for faster querying.
Update:
What you had mentioned in the comment below is a hierarchical data problem and not what relational database ideally intended for.
Solution 1
IF you know for sure that you will have only 4 levels / generation then you can do 4 join queries.
I can elaborate on this if you need to.
Solution 2
If you are not too deep or committed to the architecture of this project I would recommend restructuring it such a way, where recursion is taken care of by the server side scripting. i.e You change your CMS/taxonomy management in such a way that whenever you add/remove/modify taxonomy the script will update a table called taxonomy_childs
with all possible offspring for a given category so that you have a flat data at your disposal when you need it.
Personally I would prefer this. I always like my database to match my business logic requirement.
I can elaborate on this if you need to.
Solution 3
As mentioned earlier hierarchical data is not a strong point of a relational database. Having said that you can implement something called as Nested Set Model.
Please read more at http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
You would need to add 3 columns to your taxonomy table :- level_depth, lft, rht.
Please let me know which solution would you want me to elaborate.
Upvotes: 1