Is it a good idea to define UNIQUE constraint over all columns except primary key?

Question

I want a way to quickly check the uniqueness of the content of all columns in a table (excluding the ID) before inserting a new row. A UNIQUE constraint involving all those columns would be pretty much what I need. However, without knowing too much about how relational databases work, I don't know if it's a good practice (in terms of efficiency, etc.) to define a constraint like that.

Another, more "manual" way would be to hash all the content of a row, save the hash into another column and define a unique index for that column. But I would expect that databases probably already have that functionality built-in.

Gordon Linoff · Accepted Answer

A unique constraint is implemented using a unique index, a b-tree index in all databases I'm aware of.

Such an index would essentially double the size of your data. That may not be a concern. It would quickly -- and accurately -- determine if there are duplicates.

Note that index comparisons are based on the "natural" comparisons of data types. In particular, this can affect strings where the collation is taken into account. And spaces at the end of strings are probably ignored. A hash function approach cannot do this easily.

In addition, indexes will handle NULL values consistently (although that depends on the database). Hash functions require special treatment for NULL values.

I would caution against using unique indexes on floating point values directly. The issue is that two values can look the same, but actually be different (say, 0.9999999999999 and 1). Often, you really want some threshold -- something that is tricky both for unique constraints and for hashes.

From what you describe, I would suggest a unique index on all the columns if that is what you need. Such an index is the "built-in" way to do what you describe.

Is it a good idea to define UNIQUE constraint over all columns except primary key?

Answers (1)

Related Questions