dexter
dexter

Reputation: 13593

Modelling Hierarchical attributes

I am trying to figure out the data model for a retailer.

The retailer has several stores across the country and they are modeled using the following hierarchy:

Channel -> Zone -> City -> Store

Each store contains several articles. And each article has attributes like

Now, the retailer can set these attributes at any level in the hierarchy. Consider the following cases:

As of now, they have modeled it using RDBMS by defining global rules at the top of hierarchy and calling out exceptions separately as individual rows. Say, price table, will have price set for an article at Channel level and any changes at any level will be specified separately. Obviously, this is not very efficient when fetching the attributes at the store level.

Sample Data

Assume Channel, Zone, City and Store are collectively called entities. Channel will have ids ranging >= 4000, Zone >= 3000, City >= 2000 and stores range from 0 to 1000.

A subset of the hierarchy relationship data is given below:

Channel   | Zone     | City    | Store |
----------+----------+------------------
4001      | 3001     | 2001    | 13    |
4001      | 3001     | 2001    | 14    |
4001      | 3001     | 2002    | 15    |
4001      | 3002     | 2003    | 16    |
4001      | 3003     | 2006    | 74    |

Price Table

ArticleID | EntityID | Price
----------+----------+----------
12345     | 4001     | 2.5
12345     | 2003     | 2.9
12345     | 74       | 3.0

Here, the price 2.5 for channel 4001 will be set for the article in all stores. The next two rows set the price exceptions in some stores. The second price 2.9 set for city 2003 will be applicable only for the article in store 16 as per the hierarchy relationship mentioned above. And the third row directly sets price 3.0 for the article in store 74.

Hope this gives an idea of current model. So, can you please suggest a better way to store this?

Upvotes: 2

Views: 291

Answers (1)

Damir Sudarevic
Damir Sudarevic

Reputation: 22177

There are few ways to represent trees and hierarchies in relational model -- a search on SO will return quite a few answers.

The main idea in this model is to represent hierarchy with a closure table and a hierarchy-level. So, the Level table has (1, Channel) , (2, Zone) , (3, City) , (4, Store).

The closure table exposes each node and all descendants of the node. It is important to note that each node is its own descendant too.

enter image description here

The first CTE query (q_00) selects a node's price and assigns it to all descendants. The LevelDiff column calculates how many levels from the descendant node is the price-specifying node.

Because the price may be specified on more than one level, the final query selects store prices specified on the level with minimum LevelDiff.

The syntax is PostgeSQL, but should be easy to convert to others too.

with q_00 as (
    select
          a.ProductID
        , c.LocationID
        , c.LocationLevel
        , a.Price
        , t.DescendantLocationID
        , t.DescendantLevel
        , (t.DescendantLevel - c.LocationLevel ) as LevelDiff
    from ProductPrice as a
    join Product      as b on b.ProductId  = a.ProductID
    join Location     as c on c.LocationID = a.LocationID
    join TreeClosure  as t on t.LocationID = c.LocationID
)
select
      a.ProductID
    , DescendantLocationID
    , Price
from q_00  as a
join Level as w on w.LevelNo = a.DescendantLevel
where w.LevelName = 'Store'
  and a.Leveldiff = (select min(LevelDiff)
                       from q_00 as x 
                       where x.DescendantLocationID = a.DescendantLocationID 
                         and x.ProductID = a.ProductID )    ;

So to summarize, here is a test result, pricing was defined as:

Channel=1,                        Product=1, Price = 11.0
Channel=1, City=111,              Product=1, Price = 11.5
Channel=1, City=111, Store =1112, Product=1, Price = 12.0

The query returns (see test data below)

ProductID | DescendantLocationID | PriceID
-----------------------------------------
    1         1231                  11.00
    1         1232                  11.00
    1         1111                  11.50
    1         1112                  12.00

Here is the DDL (PosgreSQL)

CREATE TABLE Level ( 
    LevelNo              integer      NOT NULL ,
    LevelName            varchar(20)  NOT NULL 
);
ALTER TABLE Level ADD CONSTRAINT XPKLevel PRIMARY KEY (LevelNo) ;


CREATE TABLE Location ( 
    LocationID           integer  NOT NULL ,
    LocationLevel        integer  NOT NULL 
);
ALTER TABLE Location ADD CONSTRAINT XPKLocation  PRIMARY KEY (LocationID);
ALTER TABLE Location ADD CONSTRAINT XAK1Location UNIQUE      (LocationID, LocationLevel) ;


CREATE TABLE Product ( 
    ProductID            integer  NOT NULL 
);
ALTER TABLE Product ADD CONSTRAINT XPKProduct PRIMARY KEY (ProductID);


CREATE TABLE ProductPrice ( 
    ProductID            integer        NOT NULL ,
    LocationID           integer        NOT NULL ,
    Price                decimal(19,2)  NOT NULL 
);
ALTER TABLE ProductPrice ADD CONSTRAINT XPKProductPrice PRIMARY KEY (ProductID, LocationID);

CREATE TABLE ProductSupplier ( 
    ProductID            integer  NOT NULL ,
    LocationID           integer  NOT NULL ,
    SupplierID           integer  NOT NULL 
);
ALTER TABLE ProductSupplier ADD CONSTRAINT XPKProductSupplier PRIMARY KEY (ProductID, LocationID);


CREATE TABLE Supplier ( 
    SupplierID           integer  NOT NULL 
);
ALTER TABLE Supplier ADD CONSTRAINT XPKSupplier PRIMARY KEY (SupplierID) ;

CREATE TABLE TreeClosure ( 
    LocationID           integer  NOT NULL ,
    DescendantLocationID integer  NOT NULL ,
    DescendantLevel      integer  NOT NULL 
);
ALTER TABLE TreeClosure ADD CONSTRAINT XPKTreeClosure PRIMARY KEY (LocationID, DescendantLocationID);


ALTER TABLE Location
    ADD CONSTRAINT FK1_Location FOREIGN KEY (LocationLevel) REFERENCES Level(LevelNo);


ALTER TABLE ProductPrice
    ADD CONSTRAINT FK1_ProductPrice FOREIGN KEY (ProductID) REFERENCES Product(ProductID);


ALTER TABLE ProductPrice
    ADD CONSTRAINT FK2_ProductPrice FOREIGN KEY (LocationID) REFERENCES Location(LocationID);


ALTER TABLE ProductSupplier
    ADD CONSTRAINT FK1_PrdSup FOREIGN KEY (ProductID) REFERENCES Product(ProductID);

ALTER TABLE ProductSupplier
    ADD CONSTRAINT FK2_PrdSup FOREIGN KEY (SupplierID) REFERENCES Supplier(SupplierID);


ALTER TABLE ProductSupplier
    ADD CONSTRAINT FK3_PrdSup FOREIGN KEY (LocationID) REFERENCES Location(LocationID);


ALTER TABLE TreeClosure
    ADD CONSTRAINT FK1_TC FOREIGN KEY (LocationID) REFERENCES Location(LocationID);

ALTER TABLE TreeClosure
    ADD CONSTRAINT FK2_TC FOREIGN KEY (DescendantLocationID,DescendantLevel) REFERENCES Location(LocationID,LocationLevel);

And some data to test with

insert into Level (LevelNo, LevelName)
values
  (1, 'Channel') 
, (2, 'Zone') 
, (3, 'City') 
, (4, 'Store')
;

insert into Product (ProductID)
values  (1) , (2) , (3) 
;

-- Locations
insert into Location (LocationID, LocationLevel)
values
  (1,    1) 
, (11,   2)
, (111,  3)
, (1111, 4)     
, (1112, 4)
, (12,   2)
, (123,  3)
, (1231, 4) 
, (1232, 4)     
;

-- Tree closure (hierarchy)

insert into TreeClosure (LocationID, DescendantLocationID, DescendantLevel)
values
  (1   , 1   , 1)
, (1   , 11  , 2)
, (1   , 111 , 3)
, (1   , 1111, 4)
, (1   , 1112, 4)
, (11  , 11  , 2)
, (11  , 111 , 3)
, (11  , 1111, 4)   
, (11  , 1112, 4)
, (111 , 111 , 3)
, (111 , 1111, 4)   
, (111 , 1112, 4)
, (1111, 1111, 4) 
, (1112, 1112, 4)

--  

, (1   , 12  , 2)
, (1   , 123 , 3)
, (1   , 1231, 4) 
, (1   , 1232, 4)
, (12  , 12  , 2)
, (12  , 123 , 3)
, (12  , 1231, 4) 
, (12  , 1232, 4)
, (123 , 123,  3)
, (123 , 1231, 4) 
, (123 , 1232, 4)   
, (1231, 1231, 4) 
, (1232, 1232, 4)
;

--  pricing
insert into ProductPrice (ProductID, LocationID, Price) values (1, 1   , 11.0);
insert into ProductPrice (ProductID, LocationID, Price) values (1, 111 , 11.5);
insert into ProductPrice (ProductID, LocationID, Price) values (1, 1112, 12.0);

Upvotes: 2

Related Questions