[Note: start] Hopefully, there are some of you, professionals, that have already had to cope with this kind of situation. The particular case concerning post office is fictional and was made just to present the problem. This question is not about improving performance by adding indexes. [Note: end] Recently I've been wondering how to efficiently(!) create a database structure to handle volatile hierarchy levels. Let's get into the example for a better understanding of the matter. Suppose we have a post office that stores physical mails in different ways depending on really any factor (that doesn't matter here). We are to map this situation onto a database model. Ok, so we have those mails. Mails can be physically stored in cases, boxes, drawers, safe deposits and many many more (because we don't want to strict our storage types and structure, but instead allow it to be flexible to meet our future changes). This means that for now we have such types, and for instance one mail can be inside a box that is stored in a safe deposit, but there can be really any combination. For simplicity of this case let's assume that a single mail is the lowest granularity we can get. We have to remember, though, that we may also have for example empty box (that has no objects both beneath and above it) and we also want to store this in DB. This model has to: be efficient - it can't (or should it?) all be stored in a single table, since we process many mails per day be adjustable - we may need to change our way of storing some already existing objects, so we need those grouped hierarchies to change for some objects be flexible - we are definitely going to need to add new objects in the future and so the hierarchy may change for upcoming and already existing particular objects [My idea] So far I've come up with this idea: Let's store all objects in one self-referencing "hierarchy" table and mark each object to be of some type, so that recursively we can see the path of where the email is located, or show every object with hierarchy that on the top depends on safe deposit. This method would require : a) every record to be detailed in this table, which makes it contain many null values, since mails are not described by the same attributes as a drawer, b) or each object type might have its own table describing it, with a foreign key in "hierarchy table" [Warning] This table can grow so big that lookup may cause serious performance issues. This would also require us to add new structure (physical table) for every new object in our "hierarchy of storage units", which I think is fine. [Question] Could you please let me know if my idea (consider B to be my chosen requirement) is the best I can get? What can I improve? [SAMPLE DATA] Emails Table: id --- 1 2 Cases Table: id --- 1 2 Boxes Table: id --- 1 Hierarchy-relation Table: seq | id_obj | obj_type | id_parent_obj | parent_obj_type | #1 | 1 | Email | 1 | Case | -- email 1 in case 1 #2 | 1 | Case | 1 | Box | -- case 1 in box 1 #3 | 1 | Box | [null] | [null] | -- box 1 no parent #4 | 2 | Email | [null] | [null] | -- email 2 no parent #5 | 3 | Email | 1 | Box | -- email 3 in box 1 #6 | 1 | Box | [null] | [null] | -- box 1 no parent Just by the look on this sample data I see that we have some redundant information for example about the box in Hierarchy-relation table in seq #3 and #6 . I think there is some different approach to this, also the refferential should be kept on seq , I think. We can see though, that case 2 is empty.

sqldatabasedatabase-designdata-modelingdatabase-performance

Reputation: 17177

Modelling database structures for volatile hierarchy of objects

[Note: start]

Hopefully, there are some of you, professionals, that have already had to cope with this kind of situation. The particular case concerning post office is fictional and was made just to present the problem. This question is not about improving performance by adding indexes.

[Note: end]

Recently I've been wondering how to efficiently(!) create a database structure to handle volatile hierarchy levels. Let's get into the example for a better understanding of the matter.

Suppose we have a post office that stores physical mails in different ways depending on really any factor (that doesn't matter here). We are to map this situation onto a database model.

Ok, so we have those mails. Mails can be physically stored in cases, boxes, drawers, safe deposits and many many more (because we don't want to strict our storage types and structure, but instead allow it to be flexible to meet our future changes). This means that for now we have such types, and for instance one mail can be inside a box that is stored in a safe deposit, but there can be really any combination.

For simplicity of this case let's assume that a single mail is the lowest granularity we can get. We have to remember, though, that we may also have for example empty box (that has no objects both beneath and above it) and we also want to store this in DB.

This model has to:

be efficient - it can't (or should it?) all be stored in a single table, since we process many mails per day
be adjustable - we may need to change our way of storing some already existing objects, so we need those grouped hierarchies to change for some objects
be flexible - we are definitely going to need to add new objects in the future and so the hierarchy may change for upcoming and already existing particular objects

[My idea] So far I've come up with this idea:

Let's store all objects in one self-referencing "hierarchy" table and mark each object to be of some type, so that recursively we can see the path of where the email is located, or show every object with hierarchy that on the top depends on safe deposit. This method would require :

a) every record to be detailed in this table, which makes it contain many null values, since mails are not described by the same attributes as a drawer,

b) or each object type might have its own table describing it, with a foreign key in "hierarchy table"

[Warning] This table can grow so big that lookup may cause serious performance issues. This would also require us to add new structure (physical table) for every new object in our "hierarchy of storage units", which I think is fine.

[Question] Could you please let me know if my idea (consider B to be my chosen requirement) is the best I can get? What can I improve?

[SAMPLE DATA]

Emails Table:

id
---
1
2

Cases Table:

id
---
1
2

Boxes Table:

id
---
1

Hierarchy-relation Table:

seq | id_obj | obj_type | id_parent_obj | parent_obj_type |
#1  |  1     | Email    |  1            | Case            | -- email 1 in case 1
#2  |  1     | Case     |  1            | Box             | -- case 1 in box 1
#3  |  1     | Box      |  [null]       | [null]          | -- box 1 no parent
#4  |  2     | Email    |  [null]       | [null]          | -- email 2 no parent
#5  |  3     | Email    |  1            | Box             | -- email 3 in box 1
#6  |  1     | Box      |  [null]       | [null]          | -- box 1 no parent

Just by the look on this sample data I see that we have some redundant information for example about the box in Hierarchy-relation table in seq #3 and #6. I think there is some different approach to this, also the refferential should be kept on seq, I think. We can see though, that case 2 is empty.

Upvotes: 0

Answers (2)

Gottfried Lesigang

Reputation: 67331

Here a complete model (air code, untested...)

CREATE TABLE StorageType(ID INT IDENTITY PRIMARY KEY
                        ,StorageTypeName VARCHAR(100) NOT NULL);
INSERT INTO StorageType VALUES('Box'),('Case'),('OtherStorage');

CREATE TABLE Storage(ID INT IDENTITY PRIMARY KEY
                    ,StorageTypeID INT NOT NULL CONSTRAINT FK_Storage_StorageTypeID FOREIGN KEY REFERENCES StorageType(ID)
                    ,StorageName VARCHAR(100) NOT NULL
                    /*more columns to describe a storage*/
                    );
INSERT INTO Storage VALUES(1,'Box1'),(1,'Box2'),(3,'SomeStorage1');                  


CREATE TABLE ObjectType(ID INT IDENTITY PRIMARY KEY
                       ,ObjectTypeName VARCHAR(100) NOT NULL);
INSERT INTO ObjectType VALUES('Mail'),('OtherItem');

CREATE TABLE MyObject(ID INT IDENTITY PRIMARY KEY
                     ,MyObjectTypeID INT NOT NULL CONSTRAINT FK_MyObject_MyObjectTypeID FOREIGN KEY REFERENCES ObjectType(ID)
                     ,MyObjectName VARCHAR(100) NOT NULL
                     /*more columns to describe an object*/
                     );
INSERT INTO MyObject VALUES(1,'Mail1'),(1,'Mail2'),(2,'SomeObject1');

CREATE TABLE StorageLocation(ID INT IDENTITY PRIMARY KEY
                            ,CreatedOn DATETIME NOT NULL
                            ,OutDate DATETIME NULL
                            ,StorageID INT NOT NULL CONSTRAINT FK_StorageLocation_StorageID FOREIGN KEY REFERENCES Storage(ID) 
                            ,ContainerID INT NULL CONSTRAINT FK_StorageLocation_ContainerID FOREIGN KEY REFERENCES Storage(ID) 
                            /*more columns to describe the "put storage" process: who, when, how long, ... */
                            );

INSERT INTO StorageLocation VALUES(GETDATE(),NULL,2,1); --puts the Box2 into Box1, current, because OutDate IS NULL
                            /*put all storages in their places*/

CREATE TABLE ObjectLocation (ID INT IDENTITY PRIMARY KEY
                            ,CreatedOn DATETIME NOT NULL
                            ,OutDate DATETIME NULL
                            ,MyObjectID INT NOT NULL CONSTRAINT FK_ObjectLocation_MyObjectID FOREIGN KEY REFERENCES MyObject(ID) 
                            ,ContainerID INT NULL CONSTRAINT FK_ObjectLocation_ContainerID FOREIGN KEY REFERENCES Storage(ID) 
                            /*more columns to describe the "put object" process: who, when, how long, ... */
                            );

INSERT INTO ObjectLocation VALUES(GETDATE(),NULL,1,2); --puts the Mail1 into Box2 (which is in Box1), current, because OutDate IS NULL
                            /*put all objects in their places*/

Upvotes: 1

Gottfried Lesigang

Reputation: 67331

You should define your storage types in one "normal" table.

Add all information to this table, which is bound to the storage itself. But nothing about the things stored in there.

If storages can be "nested" (e.g. boxes within bigger boxes within a ...) you could either add a "self-join" (you did not state your RDBMS, SQL Server offers HIERARCHYID for this) to specify the location as a reference to the parent storage, or you could define a Locator table where you store the IDs of a storage and the IDs of its container (1:n -> One storage can be located in exactly one container).

Than you need a table with your items to store. If you have only one kind of item ("mails") you need a table to define a "mail" with all its attributes.

You can either put the location (the place it's stored) as a foreign key to the storages right into the "mail" table (so each mail knows its location), or - again - you can define a mapping with the mail's ID and the storage ID.

The second approach is needed, when you want to add more information to the process "I store a mail" (e.g. who, when, price, ...) and you could historize this. If you change a location you just set a "ValidUntilDatetime" and add a new line with the new location. So you can follow the process of putting things around... If this was just an FK-column within your mails table, this would not be possible.

If there are more items to store, you could think about a Master-table and as many different item tables as you have different things to store. They share an ID. General information (also the location) are part of the master table, specific data part of the sub tables.

Well, I hope this gets you on the right path...

EDIT: Answer your question from your comments: ad "Why need of locator table":

Just image a table

CREATE TABLE storage(id INT, Name VARCHAR(100), ...)

And sample data like

id    Name, ...
1     Box1  ...
2     Box2  ...
3     Case1 ...
[...]

And a table like this

CREATE TABLE Mail( id INT, Creation DATETIME, From VARCHAR(100), To VARCHAR(100),LocationID INT FOREING KEY REFERENCES Storage(id),...)

And sample data like:

id    Creation          From          To            LocationID    ...
1     2015-10-28 12:00  [email protected]  [email protected]      2         ...
2     2015-10-28 12:05  [email protected]  [email protected]      2         ...
3     2015-10-28 12:10  [email protected]  [email protected]      3         ...

This would make clear, that mails 1 and 2 are physically stored in Box2 and mail 3 is stored in Case1. But you have no information about: Who put it there? When was it put there? How long will it stay there? Who will come to pick it?...

If you have a locator table like

CREATE TABLE Location(id INT, mailID INT FOREIGN KEY REFERENCES Mail(id),storageID INT FOREIGN KEY REFERENCES Storage(id), When, ...)

You can store the "put-process". If an item is transfered from Box2 to Box1 you would change the locationID in the first example. But you would not keep the information that this item was in Box2 before, who transfered it, when this happend and so on...

ad "different ways to store": If a mail is stored "no where", you might allow the storageID to remain NULL, or you define a storage item called "Nowhere" and use this ID.

It is enough to store the ID of the direct container. The container itself must know where itself is located. For this read the first part of my answer.

Upvotes: 1

Modelling database structures for volatile hierarchy of objects

Answers (2)

Related Questions