Reputation: 18625
I am setting up a database where I'd like to have many-to-many relationships between some tables. There's no user interface for this database; we will be putting data into the tables using R scripts and retrieving it using Python scripts.
The entities involved are projects and cost forecasts. Multiple projects may use the same forecast. For each forecast, there are costs to develop a project in each of several future years. I need to be able to retrieve the cost forecast for each future year for each individual project.
I think the tables below would be a fairly standard way to represent these relationships. Note that "pk" means "primary key" and "fk" means "foreign key".
PROJECT
name
forecast_id (fk)
FORECAST
forecast_id (pk)
COST
forecast_id (fk)
year
cost
To retrieve the forecast for a particular project, I would just retrieve all the rows from COST
that have a matching forecast_id
. I don't need the FORECAST
table for anything, except as a home for the forecast_id
that establishes the many-to-many relationship between PROJECT
and COST
.
So my main question is, can I just drop the FORECAST
table and have a direct many-to-many relationship between PROJECT
and COST
, using the forecast_id
? I know this is physically possible, but many discussions use language along the lines that "many-to-many relationships aren't possible without a bridge table." But why would I want to add the bridge table, if I can do all my queries without it and it is one more table I would have to maintain?
Going further, many discussions of many-to-many relationships (including @mike-organek's comment below) suggest a structure similar to this:
PROJECT
project_id (pk)
name
PROJECT_COST
project_id (fk)
cost_id (fk)
COST
cost_id (pk)
year
cost
While this seems like a commonly preferred approach, it suits my needs even less well. Now every time I add a new project, instead of just assigning the forecast_id
corresponding to a particular forecast, I have to add a bunch of link records to the PROJECT_COST table, one for each future year. This will also require a lot of management, and allows potential creation of relationships I don't want (e.g., one project uses costs from one forecast for the first two years, then costs from a different forecast for the next two years).
So my second question is, is there anything preferable about the second approach over the first approach, or over my simplified approach (using just the PROJECT and COST tables)?
Update
There seems to be some confusion about what I'm asking here. So I've revised the question significantly to try to make it clearer. Note that I renamed cost_group
to forecast
as part of this.
Upvotes: 0
Views: 493
Reputation: 247250
The second approach (with the project_cost
table containing two foreign keys) is the correct way to model a many-to-many relationship.
But your idea with the shared forecast_id
(with or without forecast
table) exhibits that you are not thinking of a many-to-many relationship in the ordinary sense: if one project
is associated with a certain set of cost
s, all other project
s must either be associated with the same or a disjoint set of cost
s.
If that is what you want, I see no problem with removing the forecast
table. There is no referential integrity you are losing that way.
If you have additional requirements, for example that there has to be at least a cost
and a project
for each existing forecast_id
, things may change. That could be guaranteed with foreign keys from the forecast
table, but not without that table.
Upvotes: 3