Reputation: 1650
I'm going to create a table which will have an amount of rows between 1000-20000, and I'm having fields that might repeat a lot... about 60% of the rows will have this value, where about each 50-100 have a shared value.
I've been concerned about efficiency lately and I'm wondering whether it would be better to store this string on each row (it would be between 8-20 characters) or to create another table and link them with its representative ID instead... So having ~1-50 rows in this table replacing about 300-5000 strings with ints?
Is this a good approach, or at all even neccessary?
Upvotes: 1
Views: 580
Reputation: 700342
Yes, it's a good approach in most circumstances. It's called normalisation, and is mainly done for two reasons:
I can't tell from your question what the reason would be in your case.
The difference between the two is that the first reuses values that just happen to look the same, while the second connects values that have the same meaning. The practical difference is in what should happen if a value changes, i.e. if the value changes for one record, should the value itself change so that it changes for all other records also using it, or should that record be connected to a new value so that the other records are left unchanged.
If it's for the first reason then you will save space in the database, but it will be more complicated to update records. If it's for the second reason you will not only save space, but you will also reduce the risk of inconsistency, as a value is only stored in one place.
Upvotes: 2
Reputation: 1329
Avitus is right, that it's generally a good practice to create lookups.
Think about the JOINS you will use this table in. 1000-20000 rows are not a lot to be handled by MySQL. If you don't have any, I would not bother about the lookups, just index the column.
BUT as soon as you start joining the table with others (of the same size) that's where the performance loss comes in, which you can (most likely) compensate by introducing lookups.
Upvotes: 1
Reputation: 14233
I would recommend using an int
with a foreign key to a lookup table (like you describe in your second scenario). This will cause the index to be much smaller than indexing a VARCHAR so the storage required would be smaller. It should perform better, too.
Upvotes: 1
Reputation: 15958
That is a good approach to have a look-up table for the strings. That way you can build more efficient indexes on the integer values. It wouldn't be absolutely necessary but as a good practice I would do that.
Upvotes: 1