Reputation: 141320
This question is based on the thread.
If we have one-to-many data structure, we need to have a "help-table" to store for instance phonenumbers for one person. Many person cannot have the same phonenumbers.
I look forward for an explanation why we then need two "help-tables" between many-to-many relations. An example of this is a question-site where many users can add the same tags:
alt text http://files.getdropbox.com/u/175564/db/db-55.png
Why do we need to have the tables Question-Tag-xref
and Question-Tags
?
Why cannot we just have one table for tags as follows?
Question_id | tag
1 C
1 C++
2 Java
2 C
Why is the fact that two different questions have the same tag a problem for a computer?
Upvotes: 1
Views: 3993
Reputation: 18061
Just to add to what others say (I wont repeat their comments)
In my experience, it's not typically called a help table but a join table. Normally you're dealing with something more complicated than a simple keyword. The 'extra' table models the relationship between the 2 other entities.
Another example might be I have a marketing campaign that goes to many recipient contacts. Neither of these 2 entities is dependent on the other. Any particular campaign will have many contacts, and any contact may be sent more than one campaign. The join table in this case models the history of who was sent which campaign.
Campaign
- CampaignID (PK)
- other columns
Contact
- ContactID (PK)
- other columns
CampaignContact
- CampaignContactID (PK)
- CampaignID (FK)
- ContactID (FK)
This is quite different from the 1-many relationship (sometimes called a master-detail relationship). Here a canonical example is Invoice -> InvoiceItems. The invoice items link specifically to one and only one parent invoice.
Invoice
- InvoiceID (PK)
- other columns
InvoiceItem
- InvoiceItemID (PK)
- InvoiceID (FK)
- other columns
Upvotes: 1
Reputation: 161801
That's only one "extra" table.
It's because the same question may have many tags.
And because the same tag may be used by many questions.
You need somewhere to store (questionId, tagId) and to make sure there are no duplicates of that.
I haven't been following your questions on this topic, but it looks like there's some bad design here. I thought you only had one extra table because I assumed you had a sensible structure. You do not.
Why does Question-Tags have both the tag string and a tag id? That doesn't make much sense to me.
I don't want to go back through the sequence of questions. Still, I wanted to try to illustrate what I was talking about. So I created a very simple Object-Role Modeling model of this part of StackOverflow using the NORMA tool:
This generated the following ER diagram:
Note that the "extra" table is all we need to keep for tags, simply because there is no additional information kept about tags. Also, there is no need to store a tag id that is the foreign key to a Tags table, since the tag name is already unique. If we kept additional data about a tag then there would probably be a separate Tags table, with the primary key still being the tag name. That could be changed to use an integer id if it became a performance issue, in which case the tag name would still get a unique index over it.
Upvotes: 4
Reputation: 3294
In a relational database a many-many relationship is implemented as two reciprocal one-many relationships, each of which requires an additional table (beyond the tables directly representing the entities) to implement.
The why of it has to do with the relational database model.
Upvotes: 1
Reputation: 10013
Usually it is a lot more information than just a tag column. So if it is a lot of information then you have redundant data (you have 2 "C" values in your example). Then if the same value lives in more than one place updates become a problem. So the rule is that the data should live in 1 place and its ID is used in other places to reference it. Then when you update it, it only needs to be done in one place.
Upvotes: 0
Reputation: 532595
The issue is one of how normalized you want your table structure to be. Generally, you don't want to store information in more than one place. To that end, when data may be repeated for many items, you normalize it -- move that data to a separate table where multiple rows in the other table may reference it by storing the key of the data rather than the data itself. When you have many rows sharing the same data AND you want to normalize it, you need an intermediate table to store the relations (reference pairs) between the tables.
Upvotes: 1
Reputation: 250
http://en.wikipedia.org/wiki/Database_normalization
it's not a problem for a computer, but RDBMS theory said, that db should by normalized reducing info duplication. Here's what Dr. Codd said about need for normalization:
E.F. Codd, "Further Normalization of the Data Base Relational Model"
Upvotes: 1
Reputation: 45127
It's a question of normalization. IMHO one of the best books on this subject is Joe Celko's SQL for Smarties. Basically, you avoid what are called "anomalies". In your example, if I delete all the questions with the "Java" tag, I would never be able to know that I ever had a tag called "Java" (delete anomaly). It's also important to crack out the table because you need the xref table to describe properties of the relationship between the principals.
Upvotes: 4