Reputation: 26658
So I need to deign data model for product lifecycle with status field. Status could be say: Production, Service, Warranty, etc.
Classic way is just to have separate table Status, and foreign key to Status. However I am thinking of having just string value in this column. Implications I understand are:
Main benefits are:
Do I miss something? Is implications that I outlined are real? Your suggestions?
Upvotes: 3
Views: 1300
Reputation: 31
Unfortunately this solution will work as OLTP DB Storage for your application only. If you need to use the table for aggregations (Reporting, External Dashboarding, Logging and monitoring, etc) or sharing data with other applications (ETL to DWH, Replication to another DBs, Sync of scaled nodes, etc) you'll face several serious issues:
Some day you'll need to split your Data out from your application as your data become more valuable than the legacy code, so that it would be better to be prepared for that day in advance.
Upvotes: 3
Reputation: 964
I suppose that there is a finite amount of Status values, so validation in DB can be done using the CHECK constraint. Foreign Key constraint does pretty much the same but using values from the referenced table and locks on them.
ALTER TABLE Products ADD CONSTRAINT CH_PRODUCTS_STATUS CHECK ( Status = 'Production' OR Status = 'Service' ) GO
Yes. More storage anyway. Int value will take 4 bytes, while text value 'Services' will take 8 bytes in varchar datatype and 16 bytes in nvarchar. There is a page compression option in Enterprise Edition (in Standard also since SQL Server 2016 SP1) https://msdn.microsoft.com/en-us/library/cc280464.aspx, As you see her and as far as I understand this compression algorithm, the difference between int and varchar/nvarchar compression will be the header size, so if you have a small finite options for the Status value, compression will give you pretty equal results. I have created two tables (clustered index) with the same structure but with deferent types for the Status column, and insert 100K rows in each of them.
CREATE TABLE Products ( Id int IDENTITY PRIMARY KEY, Product varchar(200) NOT NULL, Status varchar(50) NOT NULL, Date datetime NOT NULL ) GO
CREATE TABLE Products2 ( Id int IDENTITY PRIMARY KEY, Product varchar(200) NOT NULL, Status int NOT NULL, Date datetime NOT NULL ) GO
INSERT INTO Products VALUES ('5656', 'Service', GETDATE()) GO 100000
INSERT INTO Products2 VALUES ('5656', 1, GETDATE()) GO 100000
Here is some statistics for them:
Products Table Size: 543 pages (4.2 MB) Products2 Table Size: 482 pages (3.7 MB) Products Compressed (page compression) Table Size: 173 pages Products2 Compressed (page compression) Table Size: 173 pages
Note: varchar datatype used in this demo, nvarchar will require twice more capacity than varchar.
The same size consideration as it was in the previous question. Here is the script for indexes creation:
CREATE INDEX IX_PRODUCTS_STATUS ON Products (Status) GO
CREATE INDEX IX_PRODUCTS2_STATUS ON Products2 (Status) GO
Products Index Size: 260 pages Products2 Index Size: 174 pages (3.7 MB) Products Index (page compression) Table Size: 93 pages Products2 Index(page compression) Table Size: 93 pages
Available RAM can be a good indicator for storing the status in the Products table or move it to the Statuses table because reading from RAM is still much faster than reading from Disk. The size difference can be calculated using the length of the values of the status column. If size difference is significant and SQL Server will not be able to hold active data in RAM, he will read data from disk and flush out other, probably important, data from RAM. In this case, it will make sense to move statuses values to the separate table or enable compression (if this option is available).
But at the same time, if you will have separate Statuses table, probably it will make sense to create a foreign key from Products to Statuses table. In this case, data modification operations will be slow a bit by the lookup to the Statuses table. Also, you will have +1 join, and small CPU overhead for the physical join operation (and usually cabe disregarded).
Upvotes: 5