Vitaly
Vitaly

Reputation: 83

How to get sum of the children nodes on every level in an hierarchical tree?

I have table "A" with hierarchical data like this:

create table dictionary_a
(
  id number not null,
  parent_id number,
  c_name varchar2(50),
  constraint pk_dictionary primary key (id),
  constraint fk_dictionary foreign key (parent_id) references dictionary_a (id)
);
id parent_id c_name
1            name1
2  1         name2
3  1         name3
4  3         name4
5  3         name5
6  2         name6
7  6         name7
...

(actual hierarchical data table has 7 levels, but this is subject to change)

And table "B" with data i need to sum up:

create table numeric_data
(
  dict_id number not null,
  n_sum number,
  constraint fk_numeric_data foreign key (dict_id) references dictionary_a (id)
);
dict_id n_sum
1       36.0
2       20.0
3       16.0
4       10.5
5       5.5
7       20.0
...

Note that higher level nodes also have sums tied to them.

I need to get the sum of all child nodes at each level and compare them with the actual data from the column n_sum (this column is populated by users and my job is to find all the inconsistencies):

dict_id n_sum actual_sum c_name
1       36.0  36.0       name1
2       20.0  20.0       name2
3       16.0  16.0       name3
4       10.5  10.5       name4
5       5.5   5.5        name5
6             20.0       name6
7       20.0  20.0       name7

I searched Internet, but what I could find was very closely related to specific problems, no general solution.

Test data:

insert into dictionary_a (ID, PARENT_ID, C_NAME)
values (1, null, 'Department 1');

insert into dictionary_a (ID, PARENT_ID, C_NAME)
values (2, 1, 'Department 2');

insert into dictionary_a (ID, PARENT_ID, C_NAME)
values (3, 1, 'Department 3');

insert into dictionary_a (ID, PARENT_ID, C_NAME)
values (4, 3, 'Department 4');

insert into dictionary_a (ID, PARENT_ID, C_NAME)
values (5, 3, 'Department 5');

insert into dictionary_a (ID, PARENT_ID, C_NAME)
values (6, 2, 'Department 6');

insert into dictionary_a (ID, PARENT_ID, C_NAME)
values (7, 6, 'Department 7');

insert into numeric_data (DICT_ID, N_SUM)
values (1, 36);

insert into numeric_data (DICT_ID, N_SUM)
values (2, 20);

insert into numeric_data (DICT_ID, N_SUM)
values (3, 16);

insert into numeric_data (DICT_ID, N_SUM)
values (4, 10.5);

insert into numeric_data (DICT_ID, N_SUM)
values (5, 5.5);

insert into numeric_data (DICT_ID, N_SUM)
values (7, 20);

commit;

I am working on Oracle 18c.

Upvotes: 1

Views: 787

Answers (3)

Marmite Bomber
Marmite Bomber

Reputation: 21063

I need to get the sum of all child nodes at each level and compare them with the actual data from the column n_sum

There is no need to use hierarchical query if you only compares each parent with the sum of it's child:

So first outer join to your number table twice, once for id and once for parent_id.

The sum of all child nodes is as easy as an analytic SUM over parent_id.

Than simple select all rows where the child_sum does not match the node sum.

WITH dt AS (
select da.id, da.parent_id, da.c_name, 
sum(nd.n_sum) OVER (partition by da.parent_id) as child_sum, 
ndp.n_sum as id_sum
from dictionary_a da
left join numeric_data nd on nd.dict_id = da.id
left join numeric_data ndp on ndp.dict_id = da.parent_id
WHERE parent_id IS NOT NULL)
SELECT * FROM dt
WHERE nvl(child_sum,0) != nvl(id_sum,0)

As expected you get two problems

  • for parent 2 the child sum is null but the node sum is 20 and
  • for parent 6 the child sum is 20 but the node sum is null.

Upvotes: 0

MT0
MT0

Reputation: 167962

It is unclear what your expected output is since you are generating random data; however, to solve the problem:

I need to get the sum of all child nodes at each level

You can generate all the child nodes and use CONNECT_BY_ROOT to keep a record of the root id of the hierarchy; then you can sum the values to get the total:

SELECT root_id,
       MAX(c_name),
       SUM(n_sum) AS total
FROM   (
  SELECT CONNECT_BY_ROOT(id) AS root_id,
         CONNECT_BY_ROOT(c_name) AS c_name,
         n.n_sum
  FROM   dictionary_a d
         INNER JOIN numeric_data n
         ON (d.id = n.dict_id)
  CONNECT BY PRIOR d.id = d.parent_id
)
GROUP BY root_id
ORDER BY root_id

db<>fiddle here


What you appear to want is not to sum all the child nodes but sum all the leaf nodes:

SELECT root_id,
       MAX(c_name) AS c_name,
       MAX(root_sum) As n_sum,
       SUM(n_sum) AS total
FROM   (
  SELECT CONNECT_BY_ROOT id AS root_id,
         CONNECT_BY_ROOT c_name AS c_name,
         CONNECT_BY_ROOT n_sum AS root_sum,
         d.id,
         n.n_sum
  FROM   dictionary_a d
         LEFT OUTER JOIN numeric_data n
         ON (d.id = n.dict_id)
  WHERE  CONNECT_BY_ISLEAF = 1
  CONNECT BY PRIOR d.id = d.parent_id
)
GROUP BY root_id
ORDER BY root_id

Which, for your (non-random) sample data, outputs:

ROOT_ID C_NAME N_SUM TOTAL
1 name1 36 36
2 name2 20 20
3 name3 16 16
4 name4 10.5 10.5
5 name5 5.5 5.5
6 name6 null 20
7 name7 20 20

db<>fiddle here

Upvotes: 2

Alex Poole
Alex Poole

Reputation: 191265

You can use an outer join between your tables:

select da.id, da.parent_id, da.c_name, coalesce(nd.n_sum, 0) as n_sum
from dictionary_a da
left join numeric_data nd on nd.dict_id = da.id;

and then use that as a source for a hierarchical query, tracking the root ID, name and quantity:

select id,
  parent_id,
  n_sum,
  connect_by_root id as root_id,
  connect_by_root n_sum as root_n_sum,
  connect_by_root c_name as root_c_name,
  connect_by_isleaf as isleaf
from (
  select da.id, da.parent_id, da.c_name, coalesce(nd.n_sum, 0) as n_sum
  from dictionary_a da
  left join numeric_data nd on nd.dict_id = da.id
)
connect by parent_id = prior id;

and then sum the leaf nodes to get the values you seem to want:

with cte as (
  select id,
    parent_id,
    n_sum,
    connect_by_root id as root_id,
    connect_by_root n_sum as root_n_sum,
    connect_by_root c_name as root_c_name,
    connect_by_isleaf as isleaf
  from (
    select da.id, da.parent_id, da.c_name, coalesce(nd.n_sum, 0) as n_sum
    from dictionary_a da
    left join numeric_data nd on nd.dict_id = da.id
  )
  connect by parent_id = prior id
)
select root_id as dict_id,
  root_n_sum as n_sum,
  sum(n_sum) as actual_sum,
  root_c_name as c_name
from cte
where isleaf = 1
group by root_id, root_n_sum, root_c_name
order by root_id;

Which with your explicit sample data gives:

DICT_ID N_SUM ACTUAL_SUM C_NAME
1 36 36 name1
2 20 20 name2
3 16 16 name3
4 10.5 10.5 name4
5 5.5 5.5 name5
6 0 20 name6
7 20 20 name7

I've included coalesce(nv.n_sum, 0) so the 'original' n_sum value for ID 6 shows as zero rather than null, which your example doesn't have; it will show null if you just remove the coalesce, but including it means you can add a simple

having root_n_sum != sum(n_sum)

clause to only see the discrepancies. That clause just becomes more complicated if you leave nulls alone, but it might be preferable:

with cte as (
  select id,
    parent_id,
    n_sum,
    connect_by_root id as root_id,
    connect_by_root n_sum as root_n_sum,
    connect_by_root c_name as root_c_name,
    connect_by_isleaf as isleaf
  from (
    select da.id, da.parent_id, da.c_name, nd.n_sum
    from dictionary_a da
    left join numeric_data nd on nd.dict_id = da.id
  )
  connect by parent_id = prior id
)
select root_id as dict_id,
  root_n_sum as n_sum,
  sum(n_sum) as actual_sum,
  root_c_name as c_name
from cte
where isleaf = 1
group by root_id, root_n_sum, root_c_name
having (root_n_sum is null and sum(n_sum) is not null)
or (root_n_sum is not null and sum(n_sum) is null)
or root_n_sum != sum(n_sum)
order by root_id;

gives just:

DICT_ID N_SUM ACTUAL_SUM C_NAME
6 null 20 name6

db<>fiddle

Upvotes: 3

Related Questions