Mike Chaliy
Mike Chaliy

Reputation: 26658

What are the implications of storing enum values as string in relational database(SQLServer)

So I need to deign data model for product lifecycle with status field. Status could be say: Production, Service, Warranty, etc.

enter image description here

Classic way is just to have separate table Status, and foreign key to Status. However I am thinking of having just string value in this column. Implications I understand are:

  1. Lack of value validation - this is OK in my case, because application code control this values.
  2. Probably more storage - not sure if this correct. Does SQL Server have some optimizations?
  3. Indexes - not sure here. Is there anything problematic?
  4. Performance - not sure here. Is there huge performance differences?

Main benefits are:

  1. Readable/Semantics - no matter what data in table will be immediately readable.
  2. More easy to use - adding new value to enum is just adding enum at application level. No need of tedious configurations in ORM.

Do I miss something? Is implications that I outlined are real? Your suggestions?

Upvotes: 3

Views: 1300

Answers (2)

Vitalii Bondarenko
Vitalii Bondarenko

Reputation: 31

Unfortunately this solution will work as OLTP DB Storage for your application only. If you need to use the table for aggregations (Reporting, External Dashboarding, Logging and monitoring, etc) or sharing data with other applications (ETL to DWH, Replication to another DBs, Sync of scaled nodes, etc) you'll face several serious issues:

  1. Slowly Changing Dimensions (https://en.wikipedia.org/wiki/Slowly_changing_dimension). For instance you'd like to change "Service" to "In Progress". It means that some data still have old name and new rows will take new name. In same cases you need to know as the name has been changed, in some cases you need to see them as one status (for agregations). As a compromise you can add new column StatusId, or even create statuses like "2-Service", "2-In Progress".
  2. Master Data Management (https://en.wikipedia.org/wiki/Master_data_management) If external routine (a report from a reporting tool, excel, tableau, ETL, etc) would like to know the actual dictionary for column status, you'll need to create a view that provides last value for each status. So that it would be better to have table with master data in DB.

Some day you'll need to split your Data out from your application as your data become more valuable than the legacy code, so that it would be better to be prepared for that day in advance.

Upvotes: 3

Denis Reznik
Denis Reznik

Reputation: 964

  1. I suppose that there is a finite amount of Status values, so validation in DB can be done using the CHECK constraint. Foreign Key constraint does pretty much the same but using values from the referenced table and locks on them.

    ALTER TABLE Products ADD CONSTRAINT CH_PRODUCTS_STATUS CHECK ( Status = 'Production' OR Status = 'Service' ) GO

  2. Yes. More storage anyway. Int value will take 4 bytes, while text value 'Services' will take 8 bytes in varchar datatype and 16 bytes in nvarchar. There is a page compression option in Enterprise Edition (in Standard also since SQL Server 2016 SP1) https://msdn.microsoft.com/en-us/library/cc280464.aspx, As you see her and as far as I understand this compression algorithm, the difference between int and varchar/nvarchar compression will be the header size, so if you have a small finite options for the Status value, compression will give you pretty equal results. I have created two tables (clustered index) with the same structure but with deferent types for the Status column, and insert 100K rows in each of them.

    CREATE TABLE Products ( Id int IDENTITY PRIMARY KEY, Product varchar(200) NOT NULL, Status varchar(50) NOT NULL, Date datetime NOT NULL ) GO

    CREATE TABLE Products2 ( Id int IDENTITY PRIMARY KEY, Product varchar(200) NOT NULL, Status int NOT NULL, Date datetime NOT NULL ) GO

    INSERT INTO Products VALUES ('5656', 'Service', GETDATE()) GO 100000

    INSERT INTO Products2 VALUES ('5656', 1, GETDATE()) GO 100000

Here is some statistics for them:

Products Table Size: 543 pages (4.2 MB) Products2 Table Size: 482 pages (3.7 MB) Products Compressed (page compression) Table Size: 173 pages Products2 Compressed (page compression) Table Size: 173 pages

Note: varchar datatype used in this demo, nvarchar will require twice more capacity than varchar.

  1. The same size consideration as it was in the previous question. Here is the script for indexes creation:

    CREATE INDEX IX_PRODUCTS_STATUS ON Products (Status) GO

    CREATE INDEX IX_PRODUCTS2_STATUS ON Products2 (Status) GO

Products Index Size: 260 pages Products2 Index Size: 174 pages (3.7 MB) Products Index (page compression) Table Size: 93 pages Products2 Index(page compression) Table Size: 93 pages

  1. Here is the main point of consideration, why I have described previous two points with a lot of details. Storing strings in the same table require more resources:
    • Disk space
    • Processor time and disk IO for reading data from the disk
    • RAM for reading and cache the data

Available RAM can be a good indicator for storing the status in the Products table or move it to the Statuses table because reading from RAM is still much faster than reading from Disk. The size difference can be calculated using the length of the values of the status column. If size difference is significant and SQL Server will not be able to hold active data in RAM, he will read data from disk and flush out other, probably important, data from RAM. In this case, it will make sense to move statuses values to the separate table or enable compression (if this option is available).

But at the same time, if you will have separate Statuses table, probably it will make sense to create a foreign key from Products to Statuses table. In this case, data modification operations will be slow a bit by the lookup to the Statuses table. Also, you will have +1 join, and small CPU overhead for the physical join operation (and usually cabe disregarded).

Upvotes: 5

Related Questions