I'm having what some would call a rather strange problem/question. Suppose I have a table, which may reference one (and only one) of many different other tables. How would I do that in the best way?? I'm looking for a solution which should work in a majority of databases (MS SQL, MySQL, PostgreSQL etc). The way I see it, there are a couple of different solutions (is any better than the other?): Have one column for each possible reference. Only one of these columns may contain a value for any given row, all others are null. Allows for strict foreign keys, but it gets tedious when the number of "many" (possible referenced tables) gets large Have a two column relationship, i.e. one column "describing" which table is referenced, and one referencing the instance (row in that table). Easily extended when the number of "many" (referenced tables) grows, though I can't perform single query lookup in a straightforward way (either left join all possible tables, or union multiple queries which joins towards one table each) ?? Make sense? What's best practise (if any) in this case? I specifically want to be able to query data from the referenced entity, without really knowing which of the tables are being referenced. How would you do?

Reputation: 51

SQL: Reference one to one-of-many

I'm having what some would call a rather strange problem/question.

Suppose I have a table, which may reference one (and only one) of many different other tables. How would I do that in the best way?? I'm looking for a solution which should work in a majority of databases (MS SQL, MySQL, PostgreSQL etc). The way I see it, there are a couple of different solutions (is any better than the other?):

Have one column for each possible reference. Only one of these columns may contain a value for any given row, all others are null. Allows for strict foreign keys, but it gets tedious when the number of "many" (possible referenced tables) gets large
Have a two column relationship, i.e. one column "describing" which table is referenced, and one referencing the instance (row in that table). Easily extended when the number of "many" (referenced tables) grows, though I can't perform single query lookup in a straightforward way (either left join all possible tables, or union multiple queries which joins towards one table each)
??

Make sense? What's best practise (if any) in this case? I specifically want to be able to query data from the referenced entity, without really knowing which of the tables are being referenced.

How would you do?

Upvotes: 3

Answers (2)

Branko Dimitrijevic

Reputation: 52107

1)

This is legitimate for small number of static tables. If you anticipate a number of new tables might need to be added in the future, take a look at 3) below...

2)

Please don't do that. You'd be forfeiting the declarative FOREIGN KEYs, which is one of the most important mechanisms for maintaining data integrity.

3)

Use inheritance. More info in this post:

What is the best design for a database table that can be owned by two different resources, and therefore needs two different foreign keys?

You might also be interested in looking at:

Upvotes: 1

Gordon Linoff

Reputation: 1269443

Both of these methods are suitable in any relational database, so you don't have to worry about that consideration. Both result in rather cumbersome queries. For the first method:

select . . .
from t left outer join
     ref1
     on t.ref1id = ref1.ref1id left outer join
     ref2
     on t.ref2id = ref2.ref2id . . .

For the second method:

select . . .
from t left outer join
     ref1
     on t.anyid = ref1.ref1id and anytype = 'ref1' left outer join
     ref2
     on t.anyid = ref2.ref2id and anytype = 'ref2' . . .

So, from the perspective of query simplicity, I don't see a major advantage for one versus the other. The second version has a small disadvantage -- when writing queries, you have to remember what the name is for the join. This might get lost over time. (Of course, you can use constraints or triggers to ensure that only a fixed set of values make it into the column.)

From the perspective of query performance, the first version has a major advantage. You can identify the column as a foreign key and the database can keep statistics on it. This can help the database choose the right join algorithm, for instance. The second method does not readily offer this possibility.

From the perspective of data size, the first version requires storing the id for each of the possible values. The second is more compact. From the perspective of maintainability, the first is hard to add a new object type; the second is easy.

If you have a set of things that are similar to each other, then you can consider storing them in a single table. Attributes that are not appropriate can be NULLed out. You can even create views for the different flavors of the thing. One table may or may not be an option.

In other words, there is no right answer to this question. As with many aspects of database design, it depends on how the data is going to be used. Absent other information, I would probably first try to coerce the data into a single table. If that is just not reasonable, I would go with the first option if the number of tables can be counted on one hand, and the second if there are more tables.

Upvotes: 2

SQL: Reference one to one-of-many

Answers (2)

1)

2)

3)

Related Questions