whymatter
whymatter

Reputation: 775

Which highest normal form is this table in?

Ticket Vname Nname
1      Oli   Seitz
1      Andi  Hofmann
2      Oli   Seitz
2      Oli   Schmidt
2      Tim   Schmidt
3      Tim   Hofmann

This table represents a mapping of persons (Vname, Nname) and tickets (Ticket). Vname and Nname together identify a person, but every Person (Vname, Nname) can have multiple tickets (Ticket), and a ticket can be assigned to multiple people.

The PK in this table are all three columns together. So this table should be 1NF because there is no multi dimensional data in one column.

But then I struggle. I think it is 2NF and 3NF because I can´t find any functional dependencies. (Hope they are called functional in English as well as in German)

Can someone explain which highest NF this table is and why? And what would I have to change to make it in 5NF?

Note: This is not homework, this question emerged from a discussion.

Upvotes: 1

Views: 1180

Answers (2)

Damir Sudarevic
Damir Sudarevic

Reputation: 22187

The predicate is:

[p1] Person named (FirstName) (LastName) is assigned ticket number (Ticket).

Constraints:

  • (c1.1) Person is identified by first name and last name.
  • (c1.2) Ticket is identified by ticket number.
  • (c1.3) For each person, that person may be assigned more than one ticket.
  • (c1.4) For each ticket, that ticket may be assigned to more than one person.
PersonTicket {FirstName, LastName, Ticket}
         KEY {FirstName, LastName, Ticket}

NF Reasoning:

  1. The table is all-key, hence it is in the BCNF.
  2. It is in the BCNF, it is not possible to reduce redundancy by decomposition, hence it is in the 5th NF.
  3. It is in the 5th NF and all key, hence it is in the 6th NF.

Note that the reasoning about NF does not require any data, it must be valid for an empty relational variable (table) and any possible valid values that the table may hold.

Also note that the predicate [p1] is a simple predicate, it generalises simple (elementary) facts, which is a telltale sign of the 6th NF. For example, the fact "Person named Oli Seitz is assigned ticket number 1." can not be split into two sentences without losing information.


EDIT

It has been a long discussion (see comments), so I have to expand the answer a bit.

My basic premise is that the table {Ticket, Vname, Nname} can not be decomposed into projections, or technically speaking that join dependency:

JD *{{Ticket, Vname}, {Vname, Nname}, {Ticket, Nname}}

does not hold. (note: Vname = first name, Nname =last name).

So let's take a closer look at this. What would it mean for the JD to hold? The JD can be verbalized as :

IF a person with a first name is assigned a ticket number AND a person with a last name is assigned that ticket number AND there exist a person with that first name and that last name THEN that person with that first name and that last name is assigned that ticket number.

As you first read this, it may feel right, -- for a minute or two -- but is plain wrong.

Take a look at the example:

     Original       |             Three Projections
                    |
(T,  First,  Last ) | (T, First)  (First,   Last )  (T,   Last )
----------------------------------------------------------------- 
(1, 'Tom', '.....') | (1, 'Tom')  ('Tom', '.....')  (1, '.....')
(1, '...', 'Jones') | (1, '...')  ('...', 'Jones')  (1, 'Jones')
(2, 'Tom', 'Jones') | (2, 'Tom')  ('Tom', 'Jones')  (2, 'Jones')

Note that join of tuples

(1, 'Tom) ('Tom', 'Jones') (1, 'Jones')

produces an extra tuple

(1, 'Tom', 'Jones')

which is not in the original; hence the JD does not hold.

To put it loosely, a person identifier (First, Last) is split. So, person's first name and last name are acting independently of each other; a severe logical error.

It seems that the sample data in the original question was carefully crafted to appear the JD holds; the original data-sample will pass this test. However, insert just one tuple (row) in the original table:

(1, 'Tim',  'Seitz')

Now test again, this one generates two extra tuples when joining projections:

(1, 'Tim', 'Hofmann')
(2, 'Tim', 'Seitz')

In the reasoning section I have used terms like: simple predicate, elementary fact, information, identifier, and reducing redundancy. In the comments below these have been labelled as: fuzzy non-technical, unsound, wrong, vague, irrelevant, and nonsensical.

It is interesting that these fuzzy, vague, nonsensical, etc. terms lead to the correct result, as opposed to a "precise, highly technical, (algorithm assisted ?)" method which can result in severe logical errors.

So, for those who would like to take a closer look at these -- IMHO extremely useful and practical -- terms, I can suggest a few references.

[1] About simple predicates and 6th NF:

C. J. Date; Database Design and Relational Theory;
Part III, Chapter 13, Sixth Normal Form

[2] About elementary facts:

Terry Halpin, Tony Morgan; Information Modeling and Relational Databases;
Chapter 3, Conceptual Modeling: First Steps

Terry Halpin, What is an Elementary Fact?, paper 1993.

[3] About 5th NF and reducing redundancy:

"To say that relvar R is in 5th NF is to say further non loss decomposition of R into projections may be possible, but it won't eliminate any redundancies."

"5NF guarantees freedom from redundancies that can be removed via projection."

C. J. Date; Master Class (video): Database Design and Relational Theory;
Normalization JDs and 5NF (formal)-Part 2 of 2

[4] About predicates and propositions:

Part I, Chapter 2, Predicates and Propositions
Part V, Chapter 15, Database Design is Predicate Design
in : C. J. Date; Database Design and Relational Theory.

"Conceptually, the database is a set of sentences expressing propositions taken to be true of the UoD."

Terry Halpin, Tony Morgan; Information Modeling and Relational Databases;
Chapter 2, Information Levels and Frameworks


EDIT 2

And finally, here is some code (PostgreSQL) to test.

CREATE TABLE tbl_0 (
  Ticket integer  NOT NULL
, Vname  text     NOT NULL
, Nname  text     NOT NULL

, CONSTRAINT pk_tbl PRIMARY KEY (Ticket, Vname, Nname)
);

INSERT INTO tbl_0 (Ticket, Vname, Nname)
VALUES
  (1, 'Oli',  'Seitz')
, (1, 'Andi', 'Hofmann')
, (2, 'Oli',  'Seitz')
, (2, 'Oli',  'Schmidt')
, (2, 'Tim',  'Schmidt')
, (3, 'Tim',  'Hofmann')
;

The following query generates three projections, joins them together and shows the difference between the original table and joined projections. In other words, does join dependency
JD *{{Ticket, Vname}, {Vname, Nname}, {Ticket, Nname}} hold?

WITH
p_1 AS ( -- projection {Ticket, Vname}
    SELECT DISTINCT Ticket, Vname FROM tbl_0
),
p_2 AS ( -- projection {Vname, Nname}
    SELECT DISTINCT Vname, Nname FROM tbl_0
),
p_3 AS ( -- projection {Ticket, Nname}
    SELECT DISTINCT Ticket, Nname FROM tbl_0
),
j_pro AS ( -- join projections
    SELECT Ticket, Vname, Nname
    FROM p_1
    JOIN p_2 USING (Vname)
    JOIN p_3 USING (Ticket, Nname)
),
d_0 AS ( -- tbl_0 - j_pro
    SELECT Ticket, Vname, Nname FROM tbl_0
    EXCEPT
    SELECT Ticket, Vname, Nname FROM j_pro
),
d_1 AS ( -- j_pro - tbl_0
    SELECT Ticket, Vname, Nname FROM j_pro
    EXCEPT
    SELECT Ticket, Vname, Nname FROM tbl_0
)
-- diff = (tbl_0 - j_pro) union (j_pro - tbl_0)
SELECT Ticket, Vname, Nname FROM d_0
UNION
SELECT Ticket, Vname, Nname FROM d_1

ORDER BY Ticket, Vname, Nname
;

In the first run, all looks ok -- appears that the JD holds, no difference:

+--------+-------+---------+
| ticket | vname | nname   |
+--------+-------+---------+

However it's a trick question, the example was carefully crafted to produce this result. All it takes is adding one more row to reveal the problem:

INSERT INTO tbl_0 (Ticket, Vname, Nname)
VALUES (1, 'Tim', 'Seitz');

And now we get two extra tuples (run the query again).

+--------+-------+---------+
| ticket | vname | nname   |
+--------+-------+---------+
| 1      | Tim   | Hofmann |
| 2      | Tim   | Seitz   |
+--------+-------+---------+

To make it clear here is the table and the join of the three projections.

-- Added one more row (1, Tim, Seitz)
+--------+-------+---------+
| ticket | vname | nname   |
+--------+-------+---------+
| 1      | Andi  | Hofmann |
| 1      | Oli   | Seitz   |
| 1      | Tim   | Seitz   |
| 2      | Oli   | Schmidt |
| 2      | Oli   | Seitz   |
| 2      | Tim   | Schmidt |
| 3      | Tim   | Hofmann |
+--------+-------+---------+


-- JOIN {{Ticket, Vname}, {Vname, Nname}, {Ticket, Nname}}
-- generates two extra rows
-- (1, Tim, Hoffman) and (2, Tim, Seitz)
+--------+-------+---------+
| ticket | vname | nname   |
+--------+-------+---------+
| 1      | Andi  | Hofmann |
| 1      | Oli   | Seitz   |
| 1      | Tim   | Hofmann |
| 1      | Tim   | Seitz   |
| 2      | Oli   | Schmidt |
| 2      | Oli   | Seitz   |
| 2      | Tim   | Schmidt |
| 2      | Tim   | Seitz   |
| 3      | Tim   | Hofmann |
+--------+-------+---------+

Bottom line, the JD does not hold, the original table {Ticket, Vname, Nname} can not be decomposed into projections and is in 6th NF as stated in the first part of this -- too long by now -- answer.


Final Thoughts

And finally, what does it take to determine NF of a relational variable (table in 1NF)? The usual answer is: heading, functional dependencies, and join dependencies. But heading is a set of free variables from the matching predicate. Dependencies are simply formalized notation of constraints. So what does it take? Predicate and the matching set of constraints, that is all. How much data, how many rows? None, not a single tuple (row) is needed. It is good to have few samples to make sure we understand constraints properly, but not necessary. Focusing on sample data is a huge mistake. Relvar (table) is in a specific NF for any set of valid tuples (rows), empty set included. It should not change NF as data keeps streaming in, if implemented properly. The NF is determined on logical design level, way before SQL CREATE TABLE is executed.

Upvotes: 1

philipxy
philipxy

Reputation: 15118

1NF (First Normal Form)

"1NF" has no standard meaning.

Since by definition a relation has one value per column per row, notions of "multi dimensional data in one column" don't make sense. Feel free to ask people to make sense. Feel free to ask them how whatever they do mean matters.

Normalization to higher NFs (normal forms)

The only thing that normalization to higher NFs has to do with "1NF" is that they are both trying to simplify to improve designs.

Your relation satisfies no non-trivial FDs (functional dependencies). So it is in BCNF.

Your relation satisfies no non-nontrivial MVDs (multi-valued dependencies). Ie it satisfies no non-trivial binary JDs (join dependencies). Ie it is not the join of the members of any pair of its projections other than a pair that includes itself. So it is in 4NF. You can see this by taking pairs of projections and joining them. You can also do it by applying definitions of FD & MVD and identifying them, then applying the rules of inference for them.

Your relation satisfies the non-nontrivial JD *{{Ticket, Vname}, {Vname, Nname}, {Ticket, Nname}}. So it is the join of the members of a set of its projections other than a set that includes itself. But that JD is not implied by its CKs. Ie there is no chain of joins of its projections where every join's common attributes includes a CK of the original. So it is not in 5NF. You can see this by taking sets of projections and joining them. There is no algorithm to determine what non-trivial JDs a relation satisfies with complexity better than brute force.

Relation Meanings/Predicates

On the other hand, suppose you knew the relation's meaning to the extent that you knew that it holds tuples that make a true statement from a (characteristic) predicate expressible as the conjunction of others, say

    ticket Ticket was submitted by a person with first name Vname
AND there is a person with name Vname Nname
AND ticket Ticket was submitted by a person with last name Nname

Join is designed so that the predicate of its output is the AND of the predicates of its inputs. So you would know to check for whether any corresponding decompositions of the original satisfy the JD (ie whether the relations from the conjuncts are projections of the original) and so to check whether the JD is implied by the original's CKs.

The point of normalization to higher NFs is that a JD holds when a relation's predicate can be expressed as the conjunction of others and their relations are projections of the original, so we can use the simpler separate relations instead, except we might as well JOIN/AND the relations/predicates on pairwise shared CKs because there are still no update anomalies. (If FD {x, ...} -> a holds then a certain MVD holds & a certain binary JD holds and the predicate of the relation can be expressed as ... AND a = f(x, ...).)

Note that contrary to claims that 5NF is to reduce update anomalies, it turned out that they disappear as of ETNF which lies between BCNF & 5NF. But a 5NF design is still simpler in the sense that there are fewer relations at the cost of adding ANDs to predicates. Note that MVDs & JDs that hold are hard to find only because designs with them are intuitively obviously bad so they never get proposed, because their predicates are the conjunction of others. Thus contrary to claims that 5NF is unimportant because violating JDs are rare, 5NF is the only NF that matters. (SQL systems don't support dealing with all the integrity constraints that can arise from 5NF designs, so that and ignorance leads to claims that one should settle for 3NF.)

You need to find definitions of the NFs and why they matter.

More re predicates & the relational model.

(I only answered this question because received wisdom, even in textbooks, is such a mess.)


Appendix

Projections & joins. (I was going to leave the Minimal, Complete, and Verifiable Example to you. But the JD holding was disputed by another answerer so here is an sqlfiddle.)

T
1      Oli   Seitz
1      Andi  Hofmann
2      Oli   Seitz
2      Oli   Schmidt
2      Tim   Schmidt
3      Tim   Hofmann

project Ticket, Vname (T)
1      Oli
1      Andi
2      Oli
2      Tim
3      Tim

project Vname, Nname (T)
Oli   Seitz
Andi  Hofmann
Oli   Schmidt
Tim   Schmidt
Tim   Hofmann

project Ticket, Vname (T) join project Vname, Nname (T)
1      Oli   Seitz
1      Oli   Schmidt
1      Andi  Hofmann
2      Oli   Seitz
2      Oli   Schmidt
2      Tim   Schmidt
2      Tim   Hofmann
3      Tim   Schmidt
3      Tim   Hofmann

project Ticket, Nname (T)
1      Seitz
1      Hofmann
2      Seitz
2      Schmidt
3      Hofmann

     project Ticket, Vname (T) join project Vname, Nname (T)
join project Ticket, Nname (T)
1      Oli   Seitz
1      Andi  Hofmann
2      Oli   Seitz
2      Oli   Schmidt
2      Tim   Schmidt
3      Tim   Hofmann

Upvotes: 3

Related Questions