Dave Williams
Dave Williams

Reputation: 2246

Are multiple Many-Many relationships evidence of bad design?

Good day all,

I have been learning about databases and database design and I find I am still reaching a question I cannnot answer myself. So I pose the question to the community in hopes that someone with more knowledge/experience than I can answer it.

I have been tasked with working on a database which tracks stock levels accross a fleet of ships.

The current design has a table for each ship with a list of all possible parts (Machinery Type, Part Number, Make, Serial No etc.)

This means that the details of a piece of machinery or part can be duplicated many times (as many times as there are ships in fact).

I have been experimenting with a redesign based on what I have learnt myself, and I would propose a design along the following lines:

[SHIP]
ID, Name, Class, Tonnage, Fleet, Superintendent etc.

[Machinery]
ID, Type, Make, Model etc. (Can have separate table for manufacturers and types if required)

[Part]
ID, Part number, Description, etc.

The above would be the three main tables now is where it starts to get difficult.

Each ship can have multiple items of machinery and each machinery item could be present on multiple ships (requires a junction table)

Each machinery item can have multiple parts and each part could belong to multiple machinery items (another junction table)

There could be well into hundreds of thousands of parts which would make the junction tables huge.

Additionally as soon as you want to keep track of stock you are looking at another junction table

[Stock Level]
ShipID, PartID, Stock Level

Also if you wanted a minimum stock (Could be combined with Stock Level?)

[Min Stock]
ShipID, PartID, Min Stock

And finally if you were looking to have normalised database (i.e no Part No.1 , Part No.2 or Serial No.1, Serial No.2)

You would need to have a few extra tables

[Serial Numbers]
ShipID, MachineryID, Serial No

[Part Numbers]
PartID, Part Number

Serial numbers is probably going to be fairly standard and no problem however [part numbers] will require at least as many records as are in the [Parts] table.

Map (As best as I can represent without a picture, junctions omitted for simplicity)

  <>V represent many 
  -| represent one

   -----< Serial Numbers
  |             V
  |             |
Ship >---< Machinery >---< Parts ---< Part Numbers
  V                          V
  |                          |
   ------ Stock Level -------

Now the real question is am I missing something in the basic design principles that would eliminate such huge junction tables or is this to be expected with this kind of database.

Also in cases like with part numbers where normalisation requires an additional table with at least the same number of records rather than extra columns in the original table is this the kind of thing that you would later denormalise to improve query speed?

Any hints, tips or pointers to external resources (including other forums, tutorials, books) would be greatly appreciated.

All answers welcome, thank you in advance for any help you provide.

Dave

Upvotes: 3

Views: 230

Answers (2)

egrunin
egrunin

Reputation: 25053

First, no: many-to-many relationships are not indicators of bad design.

Second, join tables are always larger than the tables they're connecting. If you have two tables with 100 records each, the join table can be up to 10000 records.

But join tables are nothing but integer pairs, and when you index them they can be very fast.

Edited to add:

You commented:

I kind of thought indexing was done by the DBMS

It's up to you to tell the DBMS what to index. After that, keeping the indexes up-to-date is done automatically.

Some DBMS have tools that will monitor your system in action and advise you on what indexes would be beneficial.

Your design process is good, so you're probably on the right track.

Upvotes: 7

Branko Dimitrijevic
Branko Dimitrijevic

Reputation: 52107

In addition to what egrunin said, junction tables are often good candidates for clustering (aka index-organized tables), if your DBMS supports it.

For example, clustering the...

[Stock Level]
ShipID, PartID, Stock Level

...would store rows with the same ShipID physically close together, making it very efficient to get all the rows associated with the given ship.

On the other hand, clustering...

[Stock Level]
PartID, ShipID, Stock Level

...would store rows with the same PartID close together, making it efficient to get the rows associated with the given part.

If you need to query in both of these directions, you'll need 2 indexes ({ShipID, PartID} and {PartID, ShipID}). Normally, secondary indexes in clustered tables must contain copy of the whole PK, requiring additional storage and causing double-lookup, making it expensive to cluster a table with multiple indexes. In this case however, we cover the same fields anyway (just in different order) so there is no overhead in the secondary index. You could even consider covering Stock Level with the secondary index to avoid the double-lookup (the clustering index already covers it naturally).

In case your DBMS doesn't support clustering, consider indexing {ShipID, PartID, Stock Level} (or {PartID, ShipID, Stock Level} or both). This way, your query is covered by the index and there is no need to access the table heap. Essentially, you achieve the effect of clustering, except additional space is wasted on the (redundant) table heap.


In addition to that, some DBMSes support leading-edge index compression, greatly diminishing the storage space of repeated values in indexes of junction tables. Compress a clustered table (which itself is an index) for maximal effect.

Upvotes: 2

Related Questions