Oğuz Can Sertel
Oğuz Can Sertel

Reputation: 759

Ecommerce Database structure for product feautures in MYSQL (innoDB)

I need to know which database model is better for performance.

First Database Model

Three tables. Features, Products, Feature values.

Feature table is

+----+-----------+
| id | name      |
+----+-----------+
|  1 | Brand     |
|  2 | Color     |
|  3 | Dimension |
|  4 | Model     |
+----+-----------+

And Feature values table

+----+---------+------------+
| id | name    | feature_id |
+----+---------+------------+
|  1 | Sony    |          1 |
|  2 | Samsung |          1 |
|  3 | Red     |          2 |
|  4 | Blue    |          2 |
|  5 | 20 "    |          3 |
|  6 | 30 "    |          3 |
|  7 | Model A |          4 |
|  8 | Model B |          4 |
+----+---------+------------+

And products table.

+----+--------------------+----------+
| id | product_name       | features |
+----+--------------------+----------+
|  1 | Sony Television    | 1-3-5-7  |
|  2 | Samsung Television | 2-4-6-8  |
+----+--------------------+----------+

As you see in this structure, if a user wants to search products according to feature, I need to use REGEXP or fulltext search in my query.

Second Database Model

In the second database model, I will delete features from products table and I will add new table called product_features.

+----+--------------------+
| id | product_name       |
+----+--------------------+
|  1 | Sony Television    |
|  2 | Samsung Television |
+----+--------------------+

And new table product_features;

+----+------------+------------+
| id | feature_id | product_id |
+----+------------+------------+
|  1 |          1 |          1 |
|  2 |          3 |          1 |
|  3 |          5 |          1 |
|  4 |          7 |          1 |
|  5 |          2 |          2 |
|  6 |          4 |          2 |
|  7 |          6 |          2 |
|  8 |          8 |          2 |
+----+------------+------------+

Now if a user wants to search products according to feature, I need to search product_features and then join products.

Question

My question is if I use second model and I have over 20 billion rows in products table, and think there are at least 10 features for every product, So product_features table will have over 200 billions rows. And maybe queries according to features will be slow.

If I use first model, when user search according to features, I have to query with fulltext search or REGEXP in 20 billions rows.

I don't know which way is better ? What is your suggestion ?

Upvotes: 1

Views: 1381

Answers (2)

pbkhrv
pbkhrv

Reputation: 647

As Rockse said, you should stick with the 2nd model. As for large data sets, once you grow to a certain size that starts getting too big for one database instance, you need to start scaling "horizontally" (across several instances). One common way to scale such really large data sets is called "sharding": split your data set into subsets and store them on different database servers. Then come up with an algorithm that will tell your application which database to go to to fetch information about a certain product.

For example, let's split the data set into 4 chunks, ~5billion rows each. Then use the "product_id % 4" (that's modulo 4) as the "key" that will tell you which database instance contains information about that particular product. A very rough pseudo code might would look like this:

connections = []

function initConnections() {
   ... connect to 4 different databases or create pools ...
   connections = [conn1, conn2, conn3, conn4];
}

function getProductDbConnection(productId) {
  return connections[productId%4];
}

function getProductFeatures(productId) {
  conn = getProductDbConnection(productId);
  ... run whatever queries you need to get features ...
}

Here is an article that talks about how Instagram sharded their data to keep up with demand: http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram

Upvotes: 1

Abhishek Gupta
Abhishek Gupta

Reputation: 4166

First Model

It is not even in 1NF form, as it has non atomic values in features attribute. Moreover it would be really difficult to add, update or delete any new features in products table. So it won't be feasible at all.

Second Model

It is normalized till 5NF and looks good, for optimizing search use subquery and use indexing on product_id and feature_id. Try avoid using JOIN in such large data.

Upvotes: 1

Related Questions