Reputation: 2051
I'm trying to determine the best way to represent the concept of "linked" contacts (people) in an existing mysql database. Let me explain what that means:
I have a table with different contacts, each with an ID. I have another table with different time periods that a contact can belong to. Each period also have an ID. There's a N:1 relationship between contacts and periods. I'd like to be able to represent that two (or more) contacts in different periods are actually the same person (for analytics sake). So if two contacts are linked, they should be treated as one for analytics purposes. For example, I might want to represent that contacts 1 and 3 are the same person.
The difficulty is in representing this concept of "linked" for more than two contacts (e.g. contacts 1, 3, and 4). If there were just two, I could simply have a table with two columns representing the connection. However, since there could be three contacts who are linked (or any number, really), this doesn't work out of the box.
I've come up with two possible solutions for now:
1) Have one table (let's call it linked_contacts) that has two columns, one for each contact ID. This will represent a connection. If there are more than 2 linked contacts, then add a connection to one of the linked contacts. The upside of this is that it's cheap to add another contact into the link. The downside is that in order to get all contacts in the link, I essentially have to construct the graph by making a query for each connection.
2) Still have the linked_contacts table with two columns. When adding a new contact to the link, generate a new row for each connection between the new contact and the pre-existing ones. The upside is that I can get all the linked contacts for a particular contact in a single query. The downside is a larger table.
Additional notes
I'm leaning towards option 2 for ease/speed of retrieval of all contacts in a link given a particular contact. Is the right way to go about it? Are there any other factors that I should take into account? I'm open to other design strategies as well!
Upvotes: 0
Views: 146
Reputation: 52137
I think the "linked list" is a wrong abstraction of your requirements.
If you need to represent the concept of "person" in the database, then just represent it directly, as its own table. For example:
All contacts of the same person share the same PERSON_ID, which implicitly connects them.
Effectively, you turn the CONTACT into a "junction" (aka. "link") table between PERSON and PERIOD, which is a standard way of modeling many-to-many relationship. You can then easily find all persons whose contacts belong to the given period, similarly to this:
SELECT DISTINCT PERSON_ID
FROM CONTACT
WHERE PERIOD_ID = <whatever>
And since there is an index on {PERIOD_ID, PERSON_NO}1, this query can be satisfied by a simple index range scan, which is very fast2.
You can also easily JOIN with the PERSON table if you need to get person's other fields, and/or with PERIOD if you need other fields from there.
1Implicitly created underneath the UNIQUE constraint, indicated by U1 in the diagram above.
2 BTW, you'll also probably need an index in the "opposite" direction: {PERSON_ID, PERIOD_ID}, to satisfy searching for contacts of a given person.
Upvotes: 2
Reputation: 39434
Since you don't expect to have many links, and you are concerned about performance as well as ease of retrieval, and management is insert-heavy, then I would have a separate table with multiple columns representing the links.
CREATE TABLE links (
contact_id0 BIGINT UNSIGNED NULL REFERENCES contacts (id),
contact_id1 BIGINT UNSIGNED NULL REFERENCES contacts (id),
contact_id2 BIGINT UNSIGNED NULL REFERENCES contacts (id),
contact_id3 BIGINT UNSIGNED NULL REFERENCES contacts (id)
);
INSERT INTO links VALUES (1,3,4,NULL);
-- find 3's links
SELECT * WHERE 3 IN (contact_id0,contact_id1,contact_id2,contact_id3);
You might also consider proper normal form of either an edge list or adjacency matrix, but the design above seems to meet your requirements with minimum complexity.
The one design tradeoff is the maximum number of links between contacts is fixed by the number of columns. But as mentioned in my comments, columns are cheap and can be added if demand warrants.
FWIW, if you wanted either your Option 1 or Option 2 and weren't open to other ideas, I would go with your #2.
Upvotes: 1