user1892972
user1892972

Reputation: 39

Mysql : multiple tables or one big table?

This question has been already asked but I've not found a "1 voice answer".

Is it better to do :

user_id | attribute_1 | attribute_2 | attribute_3 | attribute_4

user_id | attribute_2

user_id | attribute_3

user_id | attribute_4

1 big table or many small tables ? Each user can only have 1 value for attribute_X. We have a lot of data to save (100 millions users). We are using innoDB. Performance are really important for us (10 000 queries / s).

Thanks !

François

Upvotes: 0

Views: 1852

Answers (2)

Chris Halcrow
Chris Halcrow

Reputation: 31950

1 big table with : user_id | attribute_1 | attribute_2 | attribute_3 | attribute_4

will make your management easier. Too many individual lookups otherwise, which will also complicate programming against the DB with the chance to increase application errors.

Upvotes: 0

tadman
tadman

Reputation: 211580

If you adhere to the Zero, One or Many principle, whereby there is either no such thing, one of them, or an unlimited number, you would always build properly normalized tables to track things like this.

For instance, a possible schema:

CREATE TABLE user_attributes (
  id INT PRIMARY KEY NOT NULL AUTO_INCREMENT,
  user_id INT NOT NULL,
  attribute_name VARCHAR(255) NOT NULL,
  attribute_value VARCHAR(255),
  UNIQUE INDEX index_user_attributes_name(user_id, attribute_name)
);

This is the basic key-value store pattern where you can have many attributes per user.

Although the storage requirements for this is higher than a fixed-columns arrangement with the perpetually frustrating names like attribute1, the cost is small enough in the age of terabyte-sized hard-drives that it's rarely an issue.

Generally you'd create a single table for this data until insertion time becomes a problem. So long as your inserts are fast, I wouldn't worry about it. At that point you would want to consider a sharding strategy to divide this data into multiple tables with an identical schema, but only if it's required.

I would imagine that would be at the ~10-50 million rows stage, but could be higher if the amount of insert activity in this table is relatively low.

Don't forget that the best way to optimize for read activity is to use a cache: The fastest database query is the one you don't make. For that sort of thing you usually employ something like memcached to store the results of previous fetches, and you would invalidate this on a write.

As always, benchmark any proposed schema at production scale.

Upvotes: 1

Related Questions