Reputation: 39
This question has been already asked but I've not found a "1 voice answer".
Is it better to do :
user_id | attribute_1 | attribute_2 | attribute_3 | attribute_4
user_id | attribute_2
user_id | attribute_3
user_id | attribute_4
1 big table or many small tables ? Each user can only have 1 value for attribute_X. We have a lot of data to save (100 millions users). We are using innoDB. Performance are really important for us (10 000 queries / s).
Thanks !
François
Upvotes: 0
Views: 1852
Reputation: 31950
1 big table with : user_id | attribute_1 | attribute_2 | attribute_3 | attribute_4
will make your management easier. Too many individual lookups otherwise, which will also complicate programming against the DB with the chance to increase application errors.
Upvotes: 0
Reputation: 211580
If you adhere to the Zero, One or Many principle, whereby there is either no such thing, one of them, or an unlimited number, you would always build properly normalized tables to track things like this.
For instance, a possible schema:
CREATE TABLE user_attributes (
id INT PRIMARY KEY NOT NULL AUTO_INCREMENT,
user_id INT NOT NULL,
attribute_name VARCHAR(255) NOT NULL,
attribute_value VARCHAR(255),
UNIQUE INDEX index_user_attributes_name(user_id, attribute_name)
);
This is the basic key-value store pattern where you can have many attributes per user.
Although the storage requirements for this is higher than a fixed-columns arrangement with the perpetually frustrating names like attribute1
, the cost is small enough in the age of terabyte-sized hard-drives that it's rarely an issue.
Generally you'd create a single table for this data until insertion time becomes a problem. So long as your inserts are fast, I wouldn't worry about it. At that point you would want to consider a sharding strategy to divide this data into multiple tables with an identical schema, but only if it's required.
I would imagine that would be at the ~10-50 million rows stage, but could be higher if the amount of insert activity in this table is relatively low.
Don't forget that the best way to optimize for read activity is to use a cache: The fastest database query is the one you don't make. For that sort of thing you usually employ something like memcached to store the results of previous fetches, and you would invalidate this on a write.
As always, benchmark any proposed schema at production scale.
Upvotes: 1