Ethan Cox
Ethan Cox

Reputation: 78

MySQL: SQL and DB for product with multiple categories

I am working on a website which hold millions of records now (apologies cannot reveal which site) initially it had few hundred records so the query below was acceptable

Query: SELECT * FROM….WHERE category LIKE ‘%,3,%’;

But now it just kills the database as for each query it has to go through the entire 2Mil records with above query

Category table
ID NAME
1 Female
2 Fashion
3 Clothing
4 Accessories
5 Top
6 Dress
7 Earring
8 Short dress
9 Long dress
10 Male

Product table
ID…..Category….other bits
1 ,1,2,3,6,9, ……
2 ,1,2,4,7,
3 ,1,2,3,5,
4 ,10,2,3,4,

you have the picture as what is happening above. Now if I do FullText index on category row in product table it gives only 1 cardinality :(

How can I overcome this? I have considered duplicating row with each category but the database is huge currently 2 GIG and with duplicates it will turn roughly 10 GIG… more like a problem then a solution

Upvotes: 3

Views: 11017

Answers (3)

Bill Karwin
Bill Karwin

Reputation: 562230

Keep in mind that storing numbers as strings takes about twice as many bytes per digit as storing numbers as integers. Plus all those commas.

So if you're concerned about space, it won't be as much expansion as you fear to store the data in a normalized fashion.

And it will allow you to write proper queries that take advantage of indexes. So if there is some expansion, you will have traded a little bit of storage space for a big improvement in speed.

Tip: if you're using InnoDB, the primary key doesn't cost any storage because the table itself is stored as the primary key index. You should define your normalized table with the category id first and then the product id second, if you need to optimize for searches by category.

CREATE TABLE CategoryProduct (
  categoryid INT,
  productid INT,
  PRIMARY KEY (categoryid, productid)
);

See also my answer to Is storing a delimited list in a database column really that bad? for more disadvantages to using comma-separated lists.

Upvotes: 2

rutter
rutter

Reputation: 11452

One solution I've seen is to use three tables:

  • categories lists your categories
  • products lists your products, without any attached category information
  • category_map is a special table: each row links a product_id to a category_id

To look up products by category, you can then match rows in category_map against rows in products.

This is an imperfect example, but it gets the gist of it:

SELECT * FROM 
(
    SELECT * FROM category_map 
    WHERE category_id=1
) AS map 
INNER JOIN products 
ON products.id = map.product_id;

Table joins are a very powerful tool; you may want to spend some time reading up on them, if you're new to using them. Coding Horror has a visual explanation that skims over the details.

It would be a good idea to set up foreign key constraints or otherwise make sure that entries in category_map correspond to existing entries in products and categories.

Upvotes: 0

MadAsAHatter
MadAsAHatter

Reputation: 825

I would consider a new table, say Product_Category (unimaginative I know) where each row contains a column for a Foreign Key (FK) relation to the Product.id and a column for the category.

The category column can probably be a TINYINT which would only require 1 byte to store while I guess the FK column would be the same as the Product.id column (probably INT - 4 bytes), you could then index both columns so you can either find out which categories a product belongs to as well as which products belong in a category. Also, this table wouldn't need to have a Primary Key (i.e. id), saving you an an extra 4 bytes.

(see MySQL Data Type Storage Requirements)

With this solution each row in this new database would take up about 5 bytes. Since each character in the sting takes up 1 byte (Assuming ASCII and latin1 encoding), you would be looking at an increase of 3 bytes (including comma) per category per product by removing Product.category and putting the items into Product_Category, however that's no where near as big a gain as duplicating entire product rows. However, there is the cost of changing your code (unless you're far better than I am at joins).

Does this help any?

Upvotes: 1

Related Questions