Reputation: 4331
I have a table in which 3 rows of data are added per second and in which I intend to keep around 30M rows. (Older data will be removed).
I need to add a column: varchar(1000). I can't tell in advance what it's content will be but I do know it will be very repetitive : thousands to millions of rows will have the same value. It is usually around 200 characters long.
Since data is being added using a Stored Procedure I see two option
I know some of the tradeoff between these two options but I have difficulty making up my mind on the question.
Option 1 is heavier but I get faster inserts. Requires less joins hence query are simpler. Option 2 is lighter insert take longers but query have the potential to be faster. I think I'm closer to normal form but then I also have a table with a single meaningful column.
From the information I gave you, which option seems better? (You can also come up with another option).
Upvotes: 0
Views: 345
Reputation: 294287
You should also investigate page compression, perhaps you can do the simple thing and still get a a small(ish) table. Although, if you say is SQL Express, you won't be able to use it as is an Enterprise Edition requirement.
I have used repeatedly in my projects your second approach. Every insert would have to go through a stored procedure that gets the lookup value id, or inserts a new one if not found and returns the id. Specially for such large columns like your seems to be, with plenty of rows yet so few distinct values, the saving in space should trump the additional overhead of the foreign key and lookup cost in query joins. See also Disk is Cheap... That's not the point!.
Upvotes: 2