Alexander Høst
Alexander Høst

Reputation: 1028

T-SQL - Complicated recursion with sum

I apologize for the long problem description, but I was unable to break it down more than this. Before reading on, keep in mind that my end goal is T-SQL (maybe some recursive CTE?). However, a shove in the right direction would be much appreciated (I've tried a million things and have been scratching my head for hours).

Consider the following problem: I have a table of categories which is self-referencing through ParentCategoryID->CategoryID:

-----------------------
|       Category      |
-----------------------
|   *CategoryID       |
|   Name              |
|   ParentCategoryID  |
-----------------------

This type of table allows me to build a tree structure, say:

             -----------------  
             | parentCategory|
             -----------------
             /       |       \
           child(1) child   child
           /   \
      child(2) child(3)

where "child" means "child category" (ignore the numbers, I'll explain them later). Obviously I can have as many children as I like at any level.

Every day, a program I've written stores values to a table "ValueRegistration", which is connected to "Category" like so:

------------------------        ----------------------          ----------------------
|   ValueRegistration  |        |       Item         |          |     Category       |
------------------------        ----------------------          ----------------------
|    *RegID            |        |   *ItemID          |          |   *CategoryID      |
|    Date              |>-------|   CategoryID       |>---------|   Name             |
|    ItemID            |        |   ItemTypeID       |          |   ParentCategoryID |
|    Value             |        ----------------------          ----------------------
------------------------                  Y
                                          |
                                          |
                                ---------------------
                                |   ItemType        |
                                ---------------------
                                |  *ItemTypeID      |
                                |   ItemType        |
                                ---------------------

As you can see, a ValueRegistration concerns a specific Item, which in turn belongs to a certain category. The category may or may not have a parent (and grandparents and great-grandparents and so on). For instance, it may be the child all the way to the bottom left (number 2) in the tree I illustrated above. Also, an Item is of a certain ItemType.

My goal:
I register values to the ValueRegistration table daily (in other words, Date and ItemID combined is also a primary key). I want to be able to retrieve a resultset on the following form:

[ValueRegistration.Date, ItemType.ItemTypeID, Category.CategoryID, Value]

which seems simple enough (it's obviously just a bunch of joins). However, I also want results for rows that actually don't exist in the ValueRegistration table, namely results in which the values of sibling nodes for a given date and itemID are summed, and a new row is produced where ValueRegistration.Date and ItemType.ItemTypeID are the same as in the child nodes but where CategoryID is that of the parent of the child nodes. Keep in mind that an Item will NOT exist for this type of row in the resultset.

Consider for instance a scenario where I have ValueRegistrations for child 2 and 3 on a bunch of dates and a bunch of ItemIDs. Obviously, each registration belongs to a certain ItemType and Category. It should be clear to the reader that

ValueRegistration.Date, ItemType.ItemTypeID, Category.CategoryID

is a sufficient key to identify a specific ValueRegistration (in other words, it's possible to solve my problem without having to create temporary Item rows), and so I can inner join all tables and, for instance, the following result:

ValueReg.Date, ItemType.ItemTypeID, Category.CategoryID, ValueReg.Value
08-mar-2013, 1, 5, 200
08-mar-2013, 1, 6, 250

Assume now that I have four category rows that look like this:

1, category1, NULL
2, category2, 1
5, category5, 2
6, category6, 2

I.e. category 1 is the parent of category 2, and category 2 is the parent of categories 5 and 6. Category 1 has no parent. I now wish to append the following rows to my resultset:

08-mar-2013, 1, 2, (200+250)
08-mar-2013, 1, 1, (200+250+sum(values in all other childnodes of node 1)

Remember:

  1. the solution needs to be recursive, so that it is performed upwards in the tree (until NULL is reached)
  2. an Item row will NOT exist for tree nodes which are calculated, so CategoryID and ItemTypeID must be used
  3. yes, I know I could simply create "virtual" Item rows and add ValueRegistrations when I originally INSERT INTO my database, but that's solution is prone to errors, particularly if other programmers code up against my database but either forget or are unaware that results must be passed up to parent node. A solution that calculates this on request instead is much safer and, frankly, much more elegant.

I've tried to set something up along the lines of this, but I seem to get stuck with having to group by Date and ItemTypeID, and that's not allowed in a CTE. The programmer in me just wants to make a recursive function, but I'm really struggling to do that in SQL.

Anyone have an idea where to begin, what things I should try, or even (fingers crossed) a solution?

Thanks!

Alexander

EDIT: SQL FIDDLE

CREATE TABLE ItemType(
  ItemTypeID INT PRIMARY KEY,
  ItemType VARCHAR(50)
);


CREATE TABLE Category(
  CategoryID INT PRIMARY KEY,
  Name VARCHAR(50),
  ParentCategoryID INT,
  FOREIGN KEY(ParentCategoryID) REFERENCES Category(CategoryID)
  );

CREATE TABLE Item(
  ItemID INT PRIMARY KEY,
  CategoryID INT NOT NULL,
  ItemTypeID INT NOT NULL,
  FOREIGN KEY(CategoryID) REFERENCES Category(CategoryID),
  FOREIGN KEY(ItemTypeID) REFERENCES ItemType(ItemTypeID)
  );

CREATE TABLE ValueRegistration(
  RegID INT PRIMARY KEY,
  Date DATE NOT NULL,
  Value INT NOT NULL,
  ItemID INT NOT NULL,
  FOREIGN KEY(ItemID) REFERENCES Item(ItemID)
  );

INSERT INTO ItemType VALUES(1, 'ItemType1'); 
INSERT INTO ItemType VALUES(2, 'ItemType2');

INSERT INTO Category VALUES(1, 'Category1', NULL);   -- Top parent (1)
INSERT INTO Category VALUES(2, 'Category2', 1);      -- A child of 1
INSERT INTO Category VALUES(3, 'Category3', 1);      -- A child of 1
INSERT INTO Category VALUES(4, 'Category4', 2);      -- A child of 2
INSERT INTO Category VALUES(5, 'Category5', 2);      -- A child of 2
INSERT INTO Category VALUES(6, 'Category6', NULL);   -- Another top parent

INSERT INTO Item VALUES(1, 4, 1);    -- Category 4, ItemType 1
INSERT INTO Item VALUES(2, 5, 1);    -- Category 5, ItemType 1
INSERT INTO Item VALUES(3, 3, 1);    -- Category 3, ItemType 1
INSERT INTO Item VALUES(4, 1, 2);    -- Category 1, ItemType 2

INSERT INTO ValueRegistration VALUES(1, '2013-03-08', 100, 1);
INSERT INTO ValueRegistration VALUES(2, '2013-03-08', 200, 2);
INSERT INTO ValueRegistration VALUES(3, '2013-03-08', 300, 3);
INSERT INTO ValueRegistration VALUES(4, '2013-03-08', 400, 4);
INSERT INTO ValueRegistration VALUES(5, '2013-03-09', 120, 1);
INSERT INTO ValueRegistration VALUES(6, '2013-03-09', 220, 2);
INSERT INTO ValueRegistration VALUES(7, '2013-03-09', 320, 3);
INSERT INTO ValueRegistration VALUES(8, '2013-03-09', 420, 4);

-- -------------------- RESULTSET I WANT ----------------------
--  vr.Date    | ItemType    | CategoryTypeID  |  Value
-- ------------------------------------------------------------
--  2013-03-08 | 'ItemType1' | 'Category4'     | 100              Directly available
--  2013-03-08 | 'ItemType1' | 'Category5'     | 200              Directly available
--  2013-03-08 | 'ItemType1' | 'Category3'     | 300              Directly available
--  2013-03-08 | 'ItemType1' | 'Category2'     | 100+200          Calculated tree node
--  2013-03-08 | 'ItemType1' | 'Category1'     | 100+200+300      Calculated tree node
--  2013-03-08 | 'ItemType2' | 'Category1'     | 400              Directly available
--  2013-03-09 | 'ItemType1' | 'Category4'     | 120              Directly available
--  2013-03-09 | 'ItemType1' | 'Category5'     | 220              Directly available
--  2013-03-09 | 'ItemType1' | 'Category3'     | 320              Directly available
--  2013-03-09 | 'ItemType1' | 'Category2'     | 120+220          Calculated tree node
--  2013-03-09 | 'ItemType1' | 'Category1'     | 120+220+320      Calculated tree node
--  2013-03-09 | 'ItemType2' | 'Category1'     | 420              Directly available

Upvotes: 0

Views: 458

Answers (1)

Pieter Geerkens
Pieter Geerkens

Reputation: 11883

If you replace all joins to the table Category with joins to this dynamic relation, you will get the hierarchy you are lookinfg for:

with Category as (
  select * from ( values
    (1,'Fred',null),
    (2,'Joan',1),
    (3,'Greg',2),
    (4,'Jack',2),
    (5,'Jill',4),
    (6,'Bill',3),
    (7,'Sam',6)
  ) Category(CategoryID,Name,ParentCategoryID)
)
, Hierarchy as (
  select 0 as [Level],* from Category
--  where Parent is null

  union all

  select super.[Level]+1, sub.CategoryID, super.Name, super.ParentCategoryID
  from Category as sub
  join Hierarchy as super on super.CategoryID = sub.ParentCategoryID and sub.ParentCategoryID is not null
) 
select * from Hierarchy
-- where CategoryID = 6
-- order by [Level], CategoryID

For example, uncommenting the two lines at the bottom will yield this result set:

Level       CategoryID  Name ParentCategoryID
----------- ----------- ---- ----------------
0           6           Bill 3
1           6           Greg 2
2           6           Joan 1
3           6           Fred NULL

Upvotes: 1

Related Questions