Which highest normal form is this table in?

Question

Ticket Vname Nname
1      Oli   Seitz
1      Andi  Hofmann
2      Oli   Seitz
2      Oli   Schmidt
2      Tim   Schmidt
3      Tim   Hofmann

This table represents a mapping of persons (Vname, Nname) and tickets (Ticket). Vname and Nname together identify a person, but every Person (Vname, Nname) can have multiple tickets (Ticket), and a ticket can be assigned to multiple people.

The PK in this table are all three columns together. So this table should be 1NF because there is no multi dimensional data in one column.

But then I struggle. I think it is 2NF and 3NF because I can´t find any functional dependencies. (Hope they are called functional in English as well as in German)

Can someone explain which highest NF this table is and why? And what would I have to change to make it in 5NF?

Note: This is not homework, this question emerged from a discussion.

philipxy · Accepted Answer

1NF (First Normal Form)

"1NF" has no standard meaning.

Since by definition a relation has one value per column per row, notions of "multi dimensional data in one column" don't make sense. Feel free to ask people to make sense. Feel free to ask them how whatever they do mean matters.

Normalization to higher NFs (normal forms)

The only thing that normalization to higher NFs has to do with "1NF" is that they are both trying to simplify to improve designs.

Your relation satisfies no non-trivial FDs (functional dependencies). So it is in BCNF.

Your relation satisfies no non-nontrivial MVDs (multi-valued dependencies). Ie it satisfies no non-trivial binary JDs (join dependencies). Ie it is not the join of the members of any pair of its projections other than a pair that includes itself. So it is in 4NF. You can see this by taking pairs of projections and joining them. You can also do it by applying definitions of FD & MVD and identifying them, then applying the rules of inference for them.

Your relation satisfies the non-nontrivial JD *{{Ticket, Vname}, {Vname, Nname}, {Ticket, Nname}}. So it is the join of the members of a set of its projections other than a set that includes itself. But that JD is not implied by its CKs. Ie there is no chain of joins of its projections where every join's common attributes includes a CK of the original. So it is not in 5NF. You can see this by taking sets of projections and joining them. There is no algorithm to determine what non-trivial JDs a relation satisfies with complexity better than brute force.

Relation Meanings/Predicates

On the other hand, suppose you knew the relation's meaning to the extent that you knew that it holds tuples that make a true statement from a (characteristic) predicate expressible as the conjunction of others, say

    ticket Ticket was submitted by a person with first name Vname
AND there is a person with name Vname Nname
AND ticket Ticket was submitted by a person with last name Nname

Join is designed so that the predicate of its output is the AND of the predicates of its inputs. So you would know to check for whether any corresponding decompositions of the original satisfy the JD (ie whether the relations from the conjuncts are projections of the original) and so to check whether the JD is implied by the original's CKs.

The point of normalization to higher NFs is that a JD holds when a relation's predicate can be expressed as the conjunction of others and their relations are projections of the original, so we can use the simpler separate relations instead, except we might as well JOIN/AND the relations/predicates on pairwise shared CKs because there are still no update anomalies. (If FD {x, ...} -> a holds then a certain MVD holds & a certain binary JD holds and the predicate of the relation can be expressed as ... AND a = f(x, ...).)

Note that contrary to claims that 5NF is to reduce update anomalies, it turned out that they disappear as of ETNF which lies between BCNF & 5NF. But a 5NF design is still simpler in the sense that there are fewer relations at the cost of adding ANDs to predicates. Note that MVDs & JDs that hold are hard to find only because designs with them are intuitively obviously bad so they never get proposed, because their predicates are the conjunction of others. Thus contrary to claims that 5NF is unimportant because violating JDs are rare, 5NF is the only NF that matters. (SQL systems don't support dealing with all the integrity constraints that can arise from 5NF designs, so that and ignorance leads to claims that one should settle for 3NF.)

You need to find definitions of the NFs and why they matter.

More re predicates & the relational model.

(I only answered this question because received wisdom, even in textbooks, is such a mess.)

Appendix

Projections & joins. (I was going to leave the Minimal, Complete, and Verifiable Example to you. But the JD holding was disputed by another answerer so here is an sqlfiddle.)

T
1      Oli   Seitz
1      Andi  Hofmann
2      Oli   Seitz
2      Oli   Schmidt
2      Tim   Schmidt
3      Tim   Hofmann

project Ticket, Vname (T)
1      Oli
1      Andi
2      Oli
2      Tim
3      Tim

project Vname, Nname (T)
Oli   Seitz
Andi  Hofmann
Oli   Schmidt
Tim   Schmidt
Tim   Hofmann

project Ticket, Vname (T) join project Vname, Nname (T)
1      Oli   Seitz
1      Oli   Schmidt
1      Andi  Hofmann
2      Oli   Seitz
2      Oli   Schmidt
2      Tim   Schmidt
2      Tim   Hofmann
3      Tim   Schmidt
3      Tim   Hofmann

project Ticket, Nname (T)
1      Seitz
1      Hofmann
2      Seitz
2      Schmidt
3      Hofmann

     project Ticket, Vname (T) join project Vname, Nname (T)
join project Ticket, Nname (T)
1      Oli   Seitz
1      Andi  Hofmann
2      Oli   Seitz
2      Oli   Schmidt
2      Tim   Schmidt
3      Tim   Hofmann

Which highest normal form is this table in?

Answers (2)

EDIT 2

Final Thoughts

Related Questions