punkish
punkish

Reputation: 15268

Postgres ENUM data type or CHECK CONSTRAINT?

I have been migrating a MySQL db to Pg (9.1), and have been emulating MySQL ENUM data types by creating a new data type in Pg, and then using that as the column definition. My question -- could I, and would it be better to, use a CHECK CONSTRAINT instead? The MySQL ENUM types are implemented to enforce specific values entries in the rows. Could that be done with a CHECK CONSTRAINT? and, if yes, would it be better (or worse)?

Upvotes: 155

Views: 161853

Answers (7)

Alexi Theodore
Alexi Theodore

Reputation: 1677

Something that hasn't been talked about much in this thread is storage costs. As table sizes become large, the cost of storing anything in literal form versus using an ENUM becomes significantly different. ENUMs are only 1-2 bytes to store - everything else is likely going to be much larger and that could add up to GB in "bloat" on large tables.

I believe all other methods described here would still be storing the tuple value in full (unless an INT primary key was used).

Upvotes: 0

Teocali
Teocali

Reputation: 2714

From my point of view, given a same set of values

  1. Enum is a better solution if you will use it on multiple column
  2. If you want to limit the values of only one column in your application, a check constraint is a better solution.

Of course, there is a whole lot of other parameters which could creep in your decision process (typically, the fact that builtin operators are not available), but I think these two are the most prevalent ones.

Upvotes: 5

punkish
punkish

Reputation: 15268

Based on the comments and answers here, and some rudimentary research, I have the following summary to offer for comments from the Postgres-erati. Will really appreciate your input.

There are three ways to restrict entries in a Postgres database table column. Consider a table to store "colors" where you want only 'red', 'green', or 'blue' to be valid entries.

  1. Enumerated data type

    CREATE TYPE valid_colors AS ENUM ('red', 'green', 'blue');
    
    CREATE TABLE t (
        color VALID_COLORS
    );
    

    Advantages are that the type can be defined once and then reused in as many tables as needed. A standard query can list all the values for an ENUM type, and can be used to make application form widgets.

    SELECT  n.nspname AS enum_schema,  
            t.typname AS enum_name,  
            e.enumlabel AS enum_value
    FROM    pg_type t JOIN 
            pg_enum e ON t.oid = e.enumtypid JOIN 
            pg_catalog.pg_namespace n ON n.oid = t.typnamespace
    WHERE   t.typname = 'valid_colors'
    
     enum_schema | enum_name     | enum_value 
    -------------+---------------+------------
     public      | valid_colors  | red
     public      | valid_colors  | green
     public      | valid_colors  | blue
    

    Disadvantages are, the ENUM type is stored in system catalogs, so a query as above is required to view its definition. These values are not apparent when viewing the table definition. And, since an ENUM type is actually a data type separate from the built in NUMERIC and TEXT data types, the regular numeric and string operators and functions don't work on it. So, one can't do a query like

    SELECT FROM t WHERE color LIKE 'bl%'; 
    
  2. Check constraints

    CREATE TABLE t (
        colors TEXT CHECK (colors IN ('red', 'green', 'blue'))
    );
    

    Two advantage are that, one, "what you see is what you get," that is, the valid values for the column are recorded right in the table definition, and two, all native string or numeric operators work.

  3. Foreign keys

    CREATE TABLE valid_colors (
        id SERIAL PRIMARY KEY NOT NULL,
        color TEXT
    );
    
    INSERT INTO valid_colors (color) VALUES 
        ('red'),
        ('green'),
        ('blue');
    
    CREATE TABLE t (
        color_id INTEGER REFERENCES valid_colors (id)
    );
    

    Essentially the same as creating an ENUM type, except, the native numeric or string operators work, and one doesn't have to query system catalogs to discover the valid values. A join is required to link the color_id to the desired text value.

Upvotes: 255

Gord Stephen
Gord Stephen

Reputation: 1145

As other answers point out, check constraints have flexibility issues, but setting a foreign key on an integer id requires joining during lookups. Why not just use the allowed values as natural keys in the reference table?

To adapt the schema from punkish's answer:

CREATE TABLE valid_colors (
    color TEXT PRIMARY KEY
);

INSERT INTO valid_colors (color) VALUES 
    ('red'),
    ('green'),
    ('blue');

CREATE TABLE t (
    color TEXT REFERENCES valid_colors (color) ON UPDATE CASCADE
);

Values are stored inline as in the check constraint case, so there are no joins, but new valid value options can be easily added and existing values instances can be updated via ON UPDATE CASCADE (e.g. if it's decided "red" should actually be "Red", update valid_colors accordingly and the change propagates automatically).

Upvotes: 68

tomo7
tomo7

Reputation: 499

One of the big disadvantages of Foreign keys vs Check constraints is that any reporting or UI displays will have to perform a join to resolve the id to the text.

In a small system this is not a big deal but if you are working on a HR or similar system with very many small lookup tables then this can be a very big deal with lots of joins taking place just to get the text.

My recommendation would be that if the values are few and rarely changing, then use a constraint on a text field otherwise use a lookup table against an integer id field.

Upvotes: 6

quux00
quux00

Reputation: 14614

I'm hoping somebody will chime in with a good answer from the PostgreSQL database side as to why one might be preferable to the other.

From a software developer point of view, I have a slight preference for using check constraints, since PostgreSQL enum's require a cast in your SQL to do an update/insert, such as:

INSERT INTO table1 (colA, colB) VALUES('foo', 'bar'::myenum)

where "myenum" is the enum type you specified in PostgreSQL.

This certainly makes the SQL non-portable (which may not be a big deal for most people), but also is just yet another thing you have to deal with while developing applications, so I prefer having VARCHARs (or other typical primitives) with check constraints.

As a side note, I've noticed that MySQL enums do not require this type of cast, so this is something particular to PostgreSQL in my experience.

Upvotes: 2

Frank Heikens
Frank Heikens

Reputation: 127297

PostgreSQL has enum types, works as it should. I don't know if an enum is "better" than a constraint, they just both work.

Upvotes: 3

Related Questions