Nathan DeWitt
Nathan DeWitt

Reputation: 6601

How do I efficiently group data in a hierarchical format in T-SQL?

I have data like this:

Task   | Hours
1.1    |    40
2      |    40
2.1    |    60
2.1.1  |    15
15.9   |    24
16     |     5
19.1   |    40
19.1.1 |     8
19.1.2 |    12
19.2   |     6
19.2.1 |    21
19.2.2 |    15
19.2.3 |     2
19.3   |    64

I would like to group based on the first two levels of the Task, producing this result:

Task   | Hours
1.1    |    40
2      |    40
2.1    |    75
15.9   |    24
16     |     5
19.1   |    60
19.2   |    44
19.3   |    64

I want the 16 to not roll up what's beneath it, but I need all the other levels to roll up. This is SQL Server 2005. I would normally do a split on the decimal, and break it out that way, but I was wondering if there's a better way to do it in SQL.

Upvotes: 0

Views: 1556

Answers (4)

Nathan DeWitt
Nathan DeWitt

Reputation: 6601

I was thinking about this on my drive home, and I wanted to propose this solution:

Create a table that stores the hierarchy, and then do a join grabbing the task's parent.

TaskStructureTable:

task  | task_group
1     | 1
1.1   | 1.1
1.1.1 | 1.1
1.1.2 | 1.1
1.1.3 | 1.1
1.2   | 1.2
1.2.1 | 1.2

Then I could do something like this:

SELECT SUM(d.Hours) AS "Hours", t.task_group
FROM Data d
JOIN TaskStructureTable t ON d.Task = t.task

Think this would be faster than doing CHARINDEX? (yes, I can measure and know for sure)

Upvotes: 1

Phil Sandler
Phil Sandler

Reputation: 28046

Is changing the model an option? If your task column is really meant to represent a hierarchy, you should really be representing the hierarchy properly in your relational model.

If the number of levels deep is fixed at three, another option might be to add three columns to represent each of the "parts" of the task column independently.

If that's not an option, I think you can achieve this with a series of CASE statements that parse the string (plus SUM and GROUP BY).

UPDATE:

Ok, this seemed like a fun challenge, so I came up with this:

SELECT
    main_task,
    SUM(hours)
FROM
    (
    SELECT      
        task,
        CASE 
            WHEN 
                LEN(task) + 1 - CHARINDEX('.', REVERSE(task)) = CHARINDEX ('.', task) THEN task
                ELSE LEFT(task, LEN(task) + 1 - CHARINDEX('.', REVERSE(task)) - 1)
            END main_task,
        hours
    FROM 
        #temp
    ) sub
GROUP BY 
      main_task

Upvotes: 2

Sparky
Sparky

Reputation: 15115

Assuming the structure of the field task is consistent, you could use the following

select left(task,4) as Task,sum(hours) as Hours
from table
group by left(task,4)

Here is a slightly modified version

select LEFT(task,charindex('.',task+'.')+1),SUM(hours)
from test1
group by LEFT(task,charindex('.',task+'.')+1)

Upvotes: 1

ChrisLively
ChrisLively

Reputation: 88092

Another route is to add some computed columns which break the various task levels apart, then group and sum as you wish.

Upvotes: 1

Related Questions