MikeD
MikeD

Reputation: 8941

How to optimize a Database Model for a M:N relationship

Edit 10-Apr-2013

In order to make myself clear I am adding another (simplified) example showing the principle of what I am trying to achieve:

T1 - PERSONHAS                T2 - PRODUCTNEED
ANTON    has    WHEEL         CAR      need    ENGINE
ANTON    has    ENGINE        CAR      need    WHEEL
ANTON    has    NEEDLE        SHIRT    need    NEEDLE
BERTA    has    NEEDLE        SHIRT    need    THREAD
BERTA    has    THREAD        JAM      need    FRUIT
BERTA    has    ENGINE        JAM      need    SUGAR

Q3 - PERSONCANMAKE
ANTON    canmake    CAR
BERTA    canmake    SHIRT

Q4 - PERSONCANNOTMAKE
ANTON    cannotmake    SHIRT
ANTON    cannotmake    FRUIT
BERTA    cannotmake    CAR
BERTA    cannotmake    FRUIT

I have T1 and T2 and want to create queries for Q3 and Q4

End Edit 10-Apr-2013

Preface:

In order to create a product (P) I need to have certain generic capabilities (C - like a factory, supply, electricity, water, etc.) A product manager defines all generic capabilities needed to create his/her product.

In a location (L) I have certain generic capabilities (C) A location manager defines the capabilities his/her location is able to provide. This could be a clear YES, a clear NO, or the location manager does not list a certain capability at all.

DB Model:

I have created the following root entities

Location (PK: L) - values L1, L2, L3     // in real ca. 250 rows of L
Product (PK: P) - values P1, P2          // in real ca. 150 rows of P
Capability (PK: C) - values C1, C2, C3   // in real ca.  80 rows of C

and the following child (dependent) entities

ProductCapabilityAssignment:P, C (PK: P, C, FK: P, C)
    P1 C1
    P1 C2
    P2 C1
    P2 C3

LocationCapabilityAssignment: L, C, Status (Y/N) (PK: L, C, FK: L, C)
    L1 C1 Y
    L2 C1 Y
    L2 C2 Y
    L2 C3 N
    L3 C1 Y
    L3 C2 Y
    L3 C3 Y

Task:

The task is to find out whether a certain product can be produced at a certain location, whereby all capabilities defined for the product must be present at that location. In order to answer this I couldn't help myself but to

create a Cartesian Product of Location and ProductCapabilityAssignment (CL_Cart) to ensure that for each location I am listing all possible products with their cpability needs

CREATE VIEW CL_Cart AS
SELECT L.L, PCA.P, PCA.C
FROM Location AS L, ProductCapabilityAssignment AS PCA;

create an outer join between CL_Cart and LocationCapabilityAssignment to match in all capabilities a location can provide

CREATE VIEW Can_Produce AS
SELECT X.L, X.P, X.C, LCA.Status
FROM CL_CArt AS X LEFT JOIN LocationCapabilityAssignment AS LCA ON (X.C = LCA.C) AND (X.L = LCA.L);

so that finaly I get

SELECT L, P, C, Status
FROM Can_Produce;
    L1 P1 C1 Y
    L1 P1 C2 NULL        // C2 not listed for L1
    L1 P2 C1 Y
    L1 P2 C3 NULL        // C3 not listed for L1
    L2 P1 C1 Y
    L2 P1 C2 Y
    L2 P2 C1 Y
    L2 P2 C3 N           // C3 listed as "No" for L2
    L3 P1 C1 Y
    L3 P1 C2 Y
    L3 P2 C1 Y
    L3 P2 C3 Y

meaning that L1 cannot produce neither P1 nor P2, L2 can produce P1, and L3 can produce both P1, P2.

So I can query Can_Produce for a specific product/location and see what I have and what I don't have in terms of capabilities. I also can provide a shortcut overall YES/NO answer by examining Status="N" OR Status is NULL - if so the product cannot be produced.

Question:

For a relational database like MSSQL, MySQL, Oracle (not yet decided and beyond my influence) I am wondering if I have chosen the correct data model for this M:N relationship or if I could do any better. In particular I fear that with ca. 250 locations, 150 products and one product in average being defined by +/- 10 capabilities, so to say a Cartesian product of 375.000 rows, that performance will collapse due to huge memory consumption.

I would also really like to avoid stored procedures.

Any thoughts would be welcome.

Upvotes: 2

Views: 227

Answers (2)

Fergus Bown
Fergus Bown

Reputation: 1696

I'm not sure what database you are using, but here is an example that would work in sql server - shouldn't require many changes to work in other databases...

WITH ProductLocation
AS
(
    SELECT  P.P,
            P.Name as ProductName,
            L.L,
            L.Name as LocationName
    FROM    Product P
    CROSS
    JOIN    Location L
),
ProductLocationCapability
AS
(
    SELECT  PL.P, 
            PL.ProductName, 
            PL.L, 
            PL.LocationName,
            SUM(PC.C) AS RequiredCapabilities,
            SUM(CASE WHEN LC.L IS NULL THEN 0 ELSE 1 END) AS FoundCapabilities
    FROM    ProductLocation PL
    JOIN    ProductCapabilityAssignment PC
            ON  PC.P = PL.P
    LEFT
    JOIN    LocationCapabilityAssignment LC
            ON  LC.L = PL.L
            AND LC.C = PC.C
    GROUP BY PL.P, PL.ProductName, PL.L, PL.LocationName
)
SELECT  PLC.P, 
        PLC.ProductName,
        PLC.L,
        PLC.LocationName,
        CASE WHEN PLC.RequiredCapabilities = PLC.FoundCapabilities THEN 'Y' ELSE 'N' END AS CanMake
FROM    ProductLocationCapability PLC

(Not sure if the field names are correct, I couldn't quite make sense of the schema description!)

Upvotes: 1

Alocyte
Alocyte

Reputation: 333

     --Environment Variables    
Declare @Parent table (id int identity(1,1) primary key, Name varchar(20))  
Declare @Components table (id int identity(1,1) primary key, Name varchar(20))  Insert into @Components (Name) values ('Engine'),('Wheel'),('Chassis'),('NEEDLE'),('THREAD'),('FRUIT'),('SUGAR')
Declare @Objects table (id int identity(1,1) primary key, Name varchar(20)) 
Declare @Template table (id int identity(1,1) primary key, Name varchar(20), ObjectID int, ComponentID int) 
Insert into @Template (Name, ObjectID, ComponentID) 
Select 'Vehicle', O.ID, C.ID from @Objects O, @Components C where O.Name = 'Car' and C.Name in ('Engine','Wheel','Chassis')union    
Select 'Clothing', O.ID, C.ID from @Objects O, @Components C where O.Name = 'Shirt' and C.Name in ('Needle','Thread') union 
Select 'Confectionary', O.ID, C.ID from @Objects O, @Components C where O.Name = 'JAM' and C.Name in ('FRUIT','SUGAR')  
Declare @AvailableMaterials table (id int identity(1,1) primary key, TestType varchar(20), ParentID int, ComponentID int)   

--Test Data 
Insert into @AvailableMaterials (TestType,ParentID,ComponentID) 
Select 'CompleteSet', P.ID, T.ComponentID from @Parent P, @Template T where P.Name = 'Driver' and T.Objectid = (Select ID from @Objects where Name = 'Car') union   
Select 'CompleteSet', P.ID,  T.ComponentID from @Parent P, @Template T where P.Name = 'Seamstress' and T.Objectid = (Select ID from @Objects where Name = 'Shirt') union    
Select 'IncompleteSet', P.ID,  T.ComponentID from @Parent P, @Template T where P.Name = 'Confectionarist' and T.Objectid = (Select ID from @Objects where Name = 'Jam')     
and T.ComponentID not in (Select ID from @Components where Name = 'FRUIT')  




--/*What sets are incomplete?*/ 
Select *    
from @AvailableMaterials    
where ID in (   
Select  SRCData.ID  
from @AvailableMaterials SRCData cross apply (Select ObjectID from @Template T where ComponentID = SRCData.ComponentID) ObjectsMatchingComponents   
inner join @Template T  
on SRCData.ComponentID = T.ComponentID  
and T.ObjectID = ObjectsMatchingComponents.ObjectID 
cross apply (Select ObjectID, ComponentID from @Template FullTemplate where FullTemplate.ObjectID = T.ObjectID and FullTemplate.ComponentID not in (Select ComponentID from @AvailableMaterials SRC where SRC.ComponentID = FullTemplate.ComponentID)) FullTemplate 
)   


/*What sets are complete?*/ 
Select *    
from @AvailableMaterials    
where ID not in (   
Select  SRCData.ID  
from @AvailableMaterials SRCData cross apply (Select ObjectID from @Template T where ComponentID = SRCData.ComponentID) ObjectsMatchingComponents   
inner join @Template T  
on SRCData.ComponentID = T.ComponentID  
and T.ObjectID = ObjectsMatchingComponents.ObjectID 
cross apply (Select ObjectID, ComponentID from @Template FullTemplate where FullTemplate.ObjectID = T.ObjectID and FullTemplate.ComponentID not in (Select ComponentID from @AvailableMaterials SRC where SRC.ComponentID = FullTemplate.ComponentID)) FullTemplate 
)   

Hi

This is the best I can come up with... It works on the premise that you have to know what the complete set is, to know what's missing. Once you have what's missing, you can tell the complete sets from the incomplete sets.

I doubt this solution will scale well, even if moved to #tables with indexing. Possibly though...

I too would be interested in seeing a cleaner approach. The above solution was developed in a SQL 2012 version. Note cross apply which limits the Cartesian effect somewhat.

Hope this helps.

Upvotes: 2

Related Questions