Reputation: 19467
Imagine a hypothetical database, which is storing products. Each product have have 100 attributes, although any given product will only have values set for ~50 of these. I can see three ways to store this data:
A single table with 100 columns,
A single table with very few (say the 10 columns that have a value for every product), and another table with columns (product_id, attribute, value). I.e, An EAV data store.
A separate table for every columns. So the core products table might have 2 columns, and there would be 98 other tables, each with the two columns (product_id, value).
Setting aside the shades of grey between these extremes, from a pure efficiency standpoint, which is best to use? I assume it depends on the types of queries being run, i.e. if most queries are for several attributes of a product, or the value of a single attribute for several products. How does this affect the efficiency?
Assume this is a MySQL database using InnoDB, and all tables have appropriate foreign keys, and an index on the product_id. Imagine that the attribute names and values are strings, and are not indexed.
In a general sense, I am asking whether accessing a really big table takes more or less time than a query with many joins.
I found a similar question here: Best to have hundreds of columns or split into multiple tables?
The difference is, that question is asking about a specific case, and doesn't really tell me about efficiency in the general case. Other similar questions are all talking about the best way to organize the data, I just want to know how the different organizational systems impact the speed of queries.
Upvotes: 1
Views: 1912
Reputation: 51494
In general, the more tables you have, the more normalised, more correct, and hence better (ie: reduced redundancy of data) your design.
If you later find you have problems with reporting on this data, then that may be the moment to consider creating denormalised values to improve any specific performance issues. Adding denormalised values later is much less painful than normalising an existing badly designed database.
In most cases, EAV is a querying and maintenance nightmare.
An outline design would be to have a table for Products, a table for Attributes, and a ProductAttributes table that contained the ProductID and the AttributeID of the relevant entries.
Upvotes: 1
Reputation: 52107
In a general sense, I am asking whether accessing a really big table takes more or less time than a query with many joins.
JOIN will be slower.
However, if you usually query only a specific subset of columns, and this subset is "vertically partitioned" into its own separate table, querying such "lean" table is typically quicker than querying the "fat" table with all the columns.
But this is very specific and fragile (easy to break-apart as the system evolves) situation and you should test very carefully before going down that path. Your default starting position should be one table.
Upvotes: 5
Reputation: 3511
As you mentioned - it strictly depends on queries, which will be executed on these data. As you know, joins are aggravating for database. I can't imagine to make 50-60 joins for simple data reading. In my humble opinion it would be madness. :) The best thing, you can do is to introduce test data and check out your specific queries in tool as Estimated Execution Plan in Management Studio. There should exist similar tool for MySQL.
I would tend to advice you to avoid creating so much tables. I think, it have to cause problems in future. Maybe it is possible to categorise rarely used data for separate tables or to use complex types? For string data you can try to use nonclustered indexes.
Upvotes: 0