Reputation: 2986

Having composite primary keys in N-M relation

From Previous Question I got some idea about having composite primary key in n-m relationships. My condition is very similary yet a little different.

[Person]

    PERSON_ID: int, primary key
    CLIENT_ID: int, primary key
    Name: nvarchar(xx)
    primary_key( PERSON_ID, CLIENT_ID);

[Group]

    GROUP_ID: int
    CLIENT_ID: int
    Name: nvarchar(xx)
    primary_key( GROUP_ID, CLIENT_ID);

Now when I use mysql-workbench tool to generate a relationship table what it does is it creates a table:

option 1:

    [PersonHasGroup]

    PERSON_ID: int 
    CLIENT_ID: int 
    GROUP_ID: int
    CLIENT1_ID: int
primary_key( PERSON_ID, CLIENT_ID, GROUP_ID, CLIENT1_ID);

In my case both client_id would normally have the same value so I edited the table to look like

option 2:

    [PersonHasGroup]

    PERSON_ID: int
    CLIENT_ID: int
    GROUP_ID: int
primary_key( PERSON_ID, CLIENT_ID, GROUP_ID);

Are those a good practices? What my other colleagues prefer is a little different. they use:

Option 3:

   [PersonHasGroup]
    PERSON_HAS_GROUP_ID: int, auto-increment
    CLIENT_ID: int        
    PERSON_ID: int, foreign key
    GROUP_ID: int, foreign key
primary_key( PERSON_HAS_GROUP_ID, CLIENT_ID);

Which of the practise would be suitable when my relationship would be many-to-many explanation with an example would be a lot helpful :)

Upvotes: 2

Answers (5)

dbenham

Reputation: 130819

There should not be any debate between the presented options. There is one correct answer, but that answer depends on your business rules.

Your statement "In my case both client_id would normally have the same value" concerns me. Terms like normally and usually don't help when designing a system. You need to talk in terms of absolutes.

If the CLIENT_ID must be the same for both relationships, then option 2 (one single CLIENT_ID) is the correct design.

But if it is possible (perhaps rare, but possible) that each relationship can be based on different CLIENT_IDs, then of course you need option 1 with two uniquely named CLIENT_ID columns. (I can't imagine the business case for this, but your model is already very odd, so I'm not so sure)

No one can help you determine which is correct unless you fully explain your business requirements, but you indicate that is not possible.

Option 3 should not be considered an option - If you are going to introduce a surrogate key, then it should not be a composite surrogate key.

I personally would never create a surrogate key for a junction table (table that resolves a many to many relationship), unless the junction table PK can be a foreign key in another table. In that case I might create a surrogate key for performance reasons, but it is not a simple choice. If a surrogate key is created, then you should declare both a primary key and an alternate key: One with the natural composite key, and the other with the surrogate key. You want to make sure that a given Group/Person pair is entered only once, and you want the surrogate to be unique as well.

So both options 1 and 2 could be modified to include a surrogate key if your particular situation warrants it for performance reasons:

Option 1A

[PersonHasGroup]
  PERSON_ID: int
  PERSON_CLIENT_ID: int
  GROUP_ID: int
  GROUP_CLIENT_ID: int
  PERSON_HAS_GROUP_ID: int
primary_key( PERSON_ID, PERSON_CLIENT_ID, GROUP_ID, GROUP_CLIENT_ID);
alternate_key( PERSON_HAS_GROUP_ID )

Option 2A

[PersonHasGroup]
  PERSON_ID: int
  CLIENT_ID: int
  GROUP_ID: int
  PERSON_HAS_GROUP_ID: int
primary_key( PERSON_ID, CLIENT_ID, GROUP_ID);
alternate_key( PERSON_HAS_GROUP_ID )

Of course you could reverse which is the primary and which is the alternate.

Upvotes: 3

Damir Sudarevic

Reputation: 22187

If I understand this correctly, the following should be true:

You have several servers (DBs), one for each client.
Each DB has its own auto-increment for PersonID and GroupID.
You are trying to simply (directly) import all that into one multi-tenant DB; hence the ClientID in primary keys of Person and Group tables.

If these statements are not correct, ignore the rest.

enter image description here

create table PersonGroup (
    PersonID integer
  , GroupID  integer
  , ClientID integer
);

alter table PersonGroup add constraint  pk_PersonGroup
            primary key (PersonID, GroupID, ClientID);

alter table PersonGroup add constraint fk1_PersonGroup
            foreign key (PersonID, ClientID) references Person(PersonID, ClientID);

alter table PersonGroup add constraint fk2_PersonGroup
            foreign key (GroupID, ClientID) references Group(GroupID, ClientID);

Upvotes: 1

DeveloperInToronto

Reputation: 1193

Good question. But lets take a quick look at your Person and Group tables first.

Considering the name of your tables, each row in Person seems to be represnting a person and each row in Group seems to be representing a group. So conceptually, to identify a person or group you don't need Client_ID, thus you don't need it as a part of the primary key. If your model requires each person to have a unique Client_ID (in addition to your Person_ID) just make it a unique field but don't include it in the primary key.

The situation is a little more confusing with the Group table. The question is why a group cannot be identified by Group_ID alone? (The Client_ID field is relating a group to a client. But what is the nature of this relationship? Is each group created by a client? in which case you don't need to have Client_ID as a part of the key, you just need to have it as a foreign key. Is it a membership relationship? In which case you need an N to M relationship which is represented by another table).

Then we get to the relationship between Person and Group. An N to M relationship is always translated into a table consisting of the PK of both tables + any properties of the N to M relationship (such as the date of the membership). In this case you'll only need to have a table with two fields of Person_ID and Group_ID, and both of them are a part of the composite key.

You also have to be careful about the naming.. what does PersonHasGroup represent? Does the person own the group? Manages it? or it's just a membership. If it is a membership something as simple as "PersonGroup" would be good enough.

Make sure you see entities conceptually correctly, otherwise you'll fall into the trap of adding up fields to primary keys and linking tables which do not satisfy your needs fully. Keep properties of an entity encapsulated in it and don't mix them. Ask yourself "how do I identify an entity in the real world?" .. and there you have your answer to what the primary key should be. Things like "In my case both client_id would normally have the same value" are an indication that you are not seeing concepts properly and mixing them up resulting in redundancy and incorrect relationships.

Finally, I understand that sometimes a poorly designed database is handed to you and you are asked to introduce new links and entities. My advice would be to try to make corrections as much as possible. It'll rescue you in the long run.

So in my opinion your Previous Design is a far better design compared to the one in this question.

Upvotes: 0

Puggan Se

Reputation: 5846

I prefer to have a single id for easy referencing, and another unique key for the unique pair, never used workbench, so here is it first in plain sql:

CREATE TABLE person_has_group (
    person_has_group_id SERIAL,
    client_id BIGINT UNSIGNED NOT NULL REFERENCES clients (client_id),
    person_id BIGINT UNSIGNED NOT NULL,
    group_id BIGINT UNSIGNED NOT NULL,
    FOREIGN KEY (client_id, person_id) REFERENCES persons (client_id, person_id),
    FOREIGN KEY (client_id, group_id) REFERENCES groups (client_id, group_id),
    UNIQUE KEY (client_id, person_id, group_id)
);

and proberly somthing like this in "workbench":

[PersonHasGroup]
    PERSON_HAS_GROUP_ID: int, auto-increment
    CLIENT_ID: int        
    PERSON_ID: int, foreign key
    GROUP_ID: int, foreign key
    primary_key( PERSON_HAS_GROUP_ID)
    unique_key( CLIENT_ID, PERSON_ID, GROUP_ID);

Upvotes: 2

Omesh

Reputation: 29081

I think option 2 will be better as it also adds UNIQUE CONSTRAINT on columns ( PERSON_ID, CLIENT_ID, GROUP_ID) so that no duplicates will get inserted into your tables.

Upvotes: 0

Having composite primary keys in N-M relation

Answers (5)

Related Questions