alishad
alishad

Reputation: 139

MySQL: Enum vs Varchar-with-Index

Suppose, I have a requirement to create a table where one of the column will be having a value from this limited and never-changing set: 'all', 'local', 'qa', 'staging', and 'production'.

Using enum data type for this situation looks like a suitable solution but after reading this article and some other threads on the internet, I feel discouraged to use it. So, if I do not want to create a lookup table and keeping the combination of evn and name unique is also a requirement then what is my best options between a column with ENUM type and a column with VARCHAR type but with an index created on it.

Also considering that the insertion in this table will be rare and we want this particular query to execute faster:

SELECT `enabled` FROM `features`
WHERE `name` = 'some_featuere'
AND `env` IN('all', 'qa')
ORDER BY `enabled` ASC limit 1;

Which one of these is a better design and why?

CREATE TABLE `features` (
  `id` INTEGER  NOT NULL AUTO_INCREMENT,
  `name` VARCHAR (50) NOT NULL,
  `env` ENUM('all', 'local', 'qa', 'staging', 'production') NOT NULL,
  `enabled` TINYINT(1) DEFAULT 0,
  `created_at` DATETIME,
  `updated_at` DATETIME,

  PRIMARY KEY (`id`),
  UNIQUE KEY `idx_unq_features_name_env` (`name`,`env`)
);

OR

CREATE TABLE `features` (
  `id` INTEGER  NOT NULL AUTO_INCREMENT,
  `name` VARCHAR (50) NOT NULL,
  `env` VARCHAR(10) NOT NULL,
  `enabled` TINYINT(1) DEFAULT 0,
  `created_at` DATETIME,
  `updated_at` DATETIME,

  PRIMARY KEY (`id`),
  INDEX `idx_features_env` (`env`),
  UNIQUE KEY `idx_unq_features_name_env` (`name`,`env`)
);

Upvotes: 3

Views: 13477

Answers (3)

Rick James
Rick James

Reputation: 142278

This is a different Answer -- using SET datatype.

Here are the table changes:

`env` SET('local', 'qa', 'staging', 'production') NOT NULL,
PRIMARY KEY (`name`)

No secondary keys, no 'all', only a single row per name.

However, the test becomes messier. Or does it become simpler?? That is

AND `env` IN('all', 'qa')

-->

AND env = 'qa'

since a test is now for only one thing. The messy all is gone.

If all that fits with the business logic, then the table and the query are simpler and faster.

If I am wrong on the business logic, please elaborate. There may be a way to rescue this Answer (using SET and no IN).

Upvotes: 1

Rick James
Rick James

Reputation: 142278

There is a 'religious' war waging between the pro-ENUM and anti-ENUM factions. You have read one of the 'anti' articles. But many of the "evils" in that article may not apply to your situation.

Your query can be sped up by getting rid if the current PK id and replacing it by

PRIMARY KEY(name, env)

After that, no secondary indexes are needed.

What you have requires a lookup in a secondary key, then reach into the PK to get the third column. After that, sort and deliver one row.

Changing the PK avoids the extra lookup. And there should be no "downside" to the change.

If you have millions of rows in the table, and you are likely to have thousands of candidates for each env value asked for, then this will be faster, because it does not gather "thousands" of rows, sort them, only to deliver one. Instead it gets two rows and picks from them:

    (   SELECT  `enabled`
            FROM  `features`
            WHERE  `name` = 'some_featuere'
              AND  `env` = 'all'
            ORDER BY  `enabled` ASC
            limit  1
    )
    UNION DISTINCT
    (  SELECT  `enabled`
            FROM  `features`
            WHERE  `name` = 'some_featuere'
              AND  `env` = 'qa'
            ORDER BY  `enabled` ASC
            limit  1 
    )
    ORDER BY  `enabled`
    LIMIT  1;

Yes, the ORDER BY and LIMIT are repeated. I do not recommend this OR --> UNION for small datasets because there are many steps, each with overhead.

My Answer applies whether you use ENUM or some other method. It does assume InnoDB.

Upvotes: 1

Paul Campbell
Paul Campbell

Reputation: 1986

The short answer to your question is "neither" because your query would use an index on name/env in both scenarios. However, if I had to settle on one, I'd go for VARCHAR over ENUM as the lesser to two evils but I think there might be some other issues with your approach.

First, the VARCHAR option is only going to duplicate an issue with ENUM mentioned in that article (i.e. the addition of attributes or related data) while losing possibly the only advantage you might gain from ENUM, that of data integrity. You'd get data integrity via a lookup without the evilness of the ENUM.

Second, you might be focusing on a performance issue that doesn't exist with the query. How often is it run? How slow is it? As it stands, you have a index on NAME/ENV and the only way I can't think of to speed up the query would be a covering index to include ENABLED but I doubt it's a performance killer as it is and I suspect you'd see very little difference joining to a lookup table.

Third, 'ALL' as an option makes very little sense unless a feature should only ever be deployed in one environment at a time or in ALL of them simultaneously. If that doesn't hold true, you would necessarily have to delete all other records related to the feature name whenever you wanted to apply the 'ALL' option. 'ALL' would also prevent selectively enabling/disabling features in different environments or separately recording create/update events. That's introducing data management issues that don't need to exist.

Fourth, while the columns ID, NAME, CREATED_AT, UPDATED_AT are all attributes that appear to relate directly to a Feature. The columns ENV and ENABLED relate to where and how that Feature is deployed. At first glance, that suggests storing this data in a completely separate table (possibly with CREATED_AT and UPDATED_AT to indicate when they were first deployed and last updated). I'd personally go with Feature, Environment and Feature_Environment as separate tables with Foreign Keys from Feature_Environment to the other two.

Upvotes: 1

Related Questions