Rasmus
Rasmus

Reputation: 2933

Recursive sum in tree structure

I have a tree struture in a single table. The table is a tree of categories that can be nested endlessly. Each category has a ProductCount column that tells how many products are directly in the category (not summing child categories).

Id  | ParentId | Name      | ProductCount
------------------------------------
1   | -1       | Cars      | 0
2   | -1       | Bikes     | 1
3   | 1        | Ford      | 10
4   | 3        | Mustang   | 7
5   | 3        | Focus     | 4

I would like to make a sql query that for each row/category gives me the number of products including the ones in the child categories.

The output for the table above should be

Id  | ParentId | Name      | ProductCount | ProductCountIncludingChildren
--------------------------------------------------------------------------
1   | -1       | Cars      | 0            | 21
2   | -1       | Bikes     | 1            | 1
3   | 1        | Ford      | 10           | 21
4   | 3        | Mustang   | 7            | 7
5   | 3        | Focus     | 4            | 4

I know I probably should use CTE, but cant quite get it working the way it should.

Any help is appreciated!

Upvotes: 26

Views: 24952

Answers (5)

Mikael Eriksson
Mikael Eriksson

Reputation: 138960

You can use a recursive CTE where you in the anchor part get all rows and in the recursive part join to get the child rows. Remember the original Id aliased RootID from the anchor part and do sum aggregate in the main query grouped by RootID.

SQL Fiddle

MS SQL Server 2012 Schema Setup:

create table T
(
  Id int primary key,
  ParentId int,
  Name varchar(10),
  ProductCount int
);

insert into T values
(1, -1, 'Cars',    0),
(2, -1, 'Bikes',   1),
(3,  1, 'Ford',    10),
(4,  3, 'Mustang', 7),
(5,  3, 'Focus',   4);

create index IX_T_ParentID on T(ParentID) include(ProductCount, Id);

Query 1:

with C as
(
  select T.Id,
         T.ProductCount,
         T.Id as RootID
  from T
  union all
  select T.Id,
         T.ProductCount,
         C.RootID
  from T
    inner join C 
      on T.ParentId = C.Id
)
select T.Id,
       T.ParentId,
       T.Name,
       T.ProductCount,
       S.ProductCountIncludingChildren
from T
  inner join (
             select RootID,
                    sum(ProductCount) as ProductCountIncludingChildren
             from C
             group by RootID
             ) as S
    on T.Id = S.RootID
order by T.Id
option (maxrecursion 0)

Results:

| ID | PARENTID |    NAME | PRODUCTCOUNT | PRODUCTCOUNTINCLUDINGCHILDREN |
|----|----------|---------|--------------|-------------------------------|
|  1 |       -1 |    Cars |            0 |                            21 |
|  2 |       -1 |   Bikes |            1 |                             1 |
|  3 |        1 |    Ford |           10 |                            21 |
|  4 |        3 | Mustang |            7 |                             7 |
|  5 |        3 |   Focus |            4 |                             4 |

Upvotes: 30

Tom Hunter
Tom Hunter

Reputation: 5918

Actually this could be a good use of HIERARCHYID in SQL Server..

CREATE TABLE [dbo].[CategoryTree]
(
    [Id] INT,
    [ParentId] INT,
    [Name] VARCHAR(100),
    [ProductCount] INT
)
GO

INSERT [dbo].[CategoryTree]
VALUES
    (1, -1, 'Cars', 0),
    (2, -1, 'Bikes', 1),
    (3, 1, 'Ford', 10),
    (4, 3, 'Mustang', 7),
    (5, 3, 'Focus', 4)
    --,(6, 1, 'BMW', 100)
GO

Query

WITH [cteRN] AS (
    SELECT *,
        ROW_NUMBER() OVER (
            PARTITION BY [ParentId] ORDER BY [ParentId]) AS [ROW_NUMBER]
    FROM  [dbo].[CategoryTree]
),
[cteHierarchy] AS (
    SELECT CAST(
            CAST(hierarchyid::GetRoot() AS VARCHAR(100))
            + CAST([ROW_NUMBER] AS VARCHAR(100))
            + '/' AS HIERARCHYID
        ) AS [Node],
        *
    FROM [cteRN]
    WHERE [ParentId] = -1
    UNION ALL
    SELECT CAST(
            hierarchy.Node.ToString()
            + CAST(RN.[ROW_NUMBER] AS VARCHAR(100)
        ) + '/' AS HIERARCHYID),
        rn.*
    FROM [cteRN] rn
    INNER JOIN [cteHierarchy] hierarchy
        ON rn.[ParentId] = hierarchy.[Id]
)
SELECT x.[Node].ToString() AS [Node],
    x.[Id], x.[ParentId], x.[Name], x.[ProductCount],
    x.[ProductCount] + SUM(ISNULL(child.[ProductCount],0))
        AS [ProductCountIncludingChildren]
FROM [cteHierarchy] x
LEFT JOIN [cteHierarchy] child
    ON child.[Node].IsDescendantOf(x.[Node]) = 1
    AND child.[Node] <> x.[Node]
GROUP BY x.[Node], x.[Id], x.[ParentId], x.[Name], x.[ProductCount]
ORDER BY x.[Id]

Result

Results screenshot

Upvotes: 1

Jerrad
Jerrad

Reputation: 5290

This is the same concept as Tom's answer, but less code (and way faster).

with cte as
(
  select v.Id, v.ParentId, v.Name, v.ProductCount, 
  cast('/' + cast(v.Id as varchar) + '/' as varchar) Node
  from Vehicle v
  where ParentId = -1
  union all
  select v.Id, v.ParentId, v.Name, v.ProductCount,  
  cast(c.Node + CAST(v.Id as varchar) + '/' as varchar)
  from Vehicle v
  join cte c on v.ParentId = c.Id
)

select c1.Id, c1.ParentId, c1.Name, c1.ProductCount, 
c1.ProductCount + SUM(isnull(c2.ProductCount, 0)) ProductCountIncludingChildren
from cte c1
left outer join cte c2 on c1.Node <> c2.Node and left(c2.Node, LEN(c1.Node)) = c1.Node
group by c1.Id, c1.ParentId, c1.Name, c1.ProductCount
order by c1.Id

SQL Fiddle (I added some extra data rows for testing)

Upvotes: 8

brumScouse
brumScouse

Reputation: 3216

This wont be optimal but it works, however it involves 2 CTEs. 1 main CTE and a CTE in a table valued function to sum up the values for each sub tree.

The first CTE

;WITH cte 
AS 
(
SELECT 
   anchor.Id,
   anchor.ParentId,
   anchor.Name,
   anchor.ProductCount,
   s.Total AS ProductCountIncludingChildren
FROM
testTable anchor 
    CROSS APPLY SumChild(anchor.id) s
WHERE anchor.parentid = -1
UNION ALL
SELECT 
   child.Id,
   child.ParentId,
   child.Name,
   child.ProductCount,
   s.Total AS ProductCountIncludingChildren
  FROM
cte 
  INNER JOIN testTable child on child.parentid = cte.id
  CROSS APPLY SumChild(child.id) s
 )
 SELECT * from cte 

AND the function

CREATE FUNCTION SumChild 
(
@id int

)
RETURNS TABLE
AS
 RETURN  
 (
 WITH cte 
 AS 
 (
   SELECT 
     anchor.Id,
     anchor.ParentId,
     anchor.ProductCount
   FROM
      testTable anchor 
   WHERE anchor.id = @id 
   UNION ALL
SELECT 
      child.Id,
      child.ParentId,
      child.ProductCount
    FROM
   cte 
     INNER JOIN testTable child on child.parentid = cte.id
)
SELECT SUM(ProductCount) AS Total from CTE
 )
GO

Which results in:

Results in SSMS

from the source table

Source table

Apologies about formatting.

Upvotes: 0

Dave.Gugg
Dave.Gugg

Reputation: 6771

I couldn't come up with a good T-SQL, set based answer, but I did come up with an answer: The temp table mimics your table structure. The table variable is a work table.

--Initial table
CREATE TABLE #products (Id INT, ParentId INT, NAME VARCHAR(255), ProductCount INT)
INSERT INTO #products
        ( ID,ParentId, NAME, ProductCount )
VALUES  ( 1,-1,'Cars',0),(2,-1,'Bikes',1),(3,1,'Ford',10),(4,3,'Mustang',7),(5,3,'Focus',4)

--Work table
DECLARE @products TABLE (ID INT, ParentId INT, NAME VARCHAR(255), ProductCount INT, ProductCountIncludingChildren INT)
INSERT INTO @products
        ( ID ,
          ParentId ,
          NAME ,
          ProductCount ,
          ProductCountIncludingChildren
        )
SELECT  Id ,
        ParentId ,
        NAME ,
        ProductCount,
        0
FROM #products

DECLARE @i INT
SELECT @i = MAX(id) FROM @products

--Stupid loop - loops suck
WHILE @i > 0
    BEGIN
        WITH cte AS (SELECT ParentId, SUM(ProductCountIncludingChildren) AS ProductCountIncludingChildren FROM @products GROUP BY ParentId)
        UPDATE p1
        SET p1.ProductCountIncludingChildren = p1.ProductCount + isnull(p2.ProductCountIncludingChildren,0)
        FROM @products p1
        LEFT OUTER JOIN cte p2 ON p1.ID = p2.ParentId
        WHERE p1.ID = @i

        SELECT @i = @i - 1
    END

SELECT *
FROM @products

DROP TABLE #products

I'd be very interested to see a better, set based approach. The problem I ran into is that when you use recursive cte's, you start with the parent and work toward the children - this doesn't really work for getting a sum at the parent levels. You'd have to do some kind of backward recursive cte.

Upvotes: -1

Related Questions