Why does many-to-many data structure require two additional tables?

This question is based on the thread.

If we have one-to-many data structure, we need to have a "help-table" to store for instance phonenumbers for one person. Many person cannot have the same phonenumbers.

I look forward for an explanation why we then need two "help-tables" between many-to-many relations. An example of this is a question-site where many users can add the same tags:

alt text http://files.getdropbox.com/u/175564/db/db-55.png

Why do we need to have the tables Question-Tag-xref and Question-Tags?

Why cannot we just have one table for tags as follows?

   Question_id   |    tag
   1                  C 
   1                  C++
   2                  Java
   2                  C

Why is the fact that two different questions have the same tag a problem for a computer?

Upvotes: 1

Views: 3993

Answers (7)

Robert Paulson
Robert Paulson

Reputation: 18061

Just to add to what others say (I wont repeat their comments)

In my experience, it's not typically called a help table but a join table. Normally you're dealing with something more complicated than a simple keyword. The 'extra' table models the relationship between the 2 other entities.

Another example might be I have a marketing campaign that goes to many recipient contacts. Neither of these 2 entities is dependent on the other. Any particular campaign will have many contacts, and any contact may be sent more than one campaign. The join table in this case models the history of who was sent which campaign.

Campaign 
 - CampaignID (PK)
 - other columns

Contact 
 - ContactID (PK)
 - other columns

CampaignContact
 - CampaignContactID (PK)
 - CampaignID (FK)
 - ContactID (FK)

This is quite different from the 1-many relationship (sometimes called a master-detail relationship). Here a canonical example is Invoice -> InvoiceItems. The invoice items link specifically to one and only one parent invoice.

Invoice
 - InvoiceID (PK)
 - other columns

InvoiceItem
 - InvoiceItemID (PK)
 - InvoiceID (FK)
 - other columns

Upvotes: 1

John Saunders
John Saunders

Reputation: 161801

That's only one "extra" table.

It's because the same question may have many tags.

And because the same tag may be used by many questions.

You need somewhere to store (questionId, tagId) and to make sure there are no duplicates of that.


I haven't been following your questions on this topic, but it looks like there's some bad design here. I thought you only had one extra table because I assumed you had a sensible structure. You do not.

Why does Question-Tags have both the tag string and a tag id? That doesn't make much sense to me.


I don't want to go back through the sequence of questions. Still, I wanted to try to illustrate what I was talking about. So I created a very simple Object-Role Modeling model of this part of StackOverflow using the NORMA tool:

Simple model of StackOverflow

This generated the following ER diagram:

ER diagram

Note that the "extra" table is all we need to keep for tags, simply because there is no additional information kept about tags. Also, there is no need to store a tag id that is the foreign key to a Tags table, since the tag name is already unique. If we kept additional data about a tag then there would probably be a separate Tags table, with the primary key still being the tag name. That could be changed to use an integer id if it became a performance issue, in which case the tag name would still get a unique index over it.

Upvotes: 4

Jeff Leonard
Jeff Leonard

Reputation: 3294

In a relational database a many-many relationship is implemented as two reciprocal one-many relationships, each of which requires an additional table (beyond the tables directly representing the entities) to implement.

  • First, a one-many relationship between a row in the first table to many rows in the second table.
  • Second, another one-many relationship between a row of the second table to many rows in the first table.

The why of it has to do with the relational database model.

Upvotes: 1

JBrooks
JBrooks

Reputation: 10013

Usually it is a lot more information than just a tag column. So if it is a lot of information then you have redundant data (you have 2 "C" values in your example). Then if the same value lives in more than one place updates become a problem. So the rule is that the data should live in 1 place and its ID is used in other places to reference it. Then when you update it, it only needs to be done in one place.

Upvotes: 0

tvanfosson
tvanfosson

Reputation: 532595

The issue is one of how normalized you want your table structure to be. Generally, you don't want to store information in more than one place. To that end, when data may be repeated for many items, you normalize it -- move that data to a separate table where multiple rows in the other table may reference it by storing the key of the data rather than the data itself. When you have many rows sharing the same data AND you want to normalize it, you need an intermediate table to store the relations (reference pairs) between the tables.

Upvotes: 1

dotsbyname
dotsbyname

Reputation: 250

http://en.wikipedia.org/wiki/Database_normalization

it's not a problem for a computer, but RDBMS theory said, that db should by normalized reducing info duplication. Here's what Dr. Codd said about need for normalization:

  1. To free the collection of relations from undesirable insertion, update and deletion dependencies;
  2. To reduce the need for restructuring the collection of relations as new types of data are introduced, and thus increase the life span of application programs;
  3. To make the relational model more informative to users;
  4. To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by.

E.F. Codd, "Further Normalization of the Data Base Relational Model"

Upvotes: 1

JP Alioto
JP Alioto

Reputation: 45127

It's a question of normalization. IMHO one of the best books on this subject is Joe Celko's SQL for Smarties. Basically, you avoid what are called "anomalies". In your example, if I delete all the questions with the "Java" tag, I would never be able to know that I ever had a tag called "Java" (delete anomaly). It's also important to crack out the table because you need the xref table to describe properties of the relationship between the principals.

Upvotes: 4

Related Questions