A Saraf
A Saraf

Reputation: 275

Hive - Flatten Hierarchy Table into Levels

I have Hierarchy table with Parent Child relationship maximum level 15. I need to find out all the level of child nodes for each parent node. I have tried Recursive query but it is not working in Hive and Impala. Please suggest some query to solve this statement.

+---------------+----------------+
|parent_node_id | child_node_id  |
+---------------+----------------+
|  C1           |    C11         |
+---------------+----------------+
|  C11          |    C12         |
+---------------+----------------+
|  C12          |    123         |
+---------------+----------------+
|  C12          |    456         |
+---------------+----------------+
|  P1           |    C1          |
+---------------+----------------+
|  P1           |    C2          | 
+---------------+----------------+
|  P2           |    C3          | 
+---------------+----------------+


+-------+----------+
| node  | balance  |
+-------+----------+
|  123  |  100     |
+-------+----------+
|  456  |  150     |
+-------+----------+
|  C2   |  400     |
+-------+----------+
|  C3   |  200     |
+-------+----------+

Output Data:

+---------------+-------------+---------+----------+----------+
|Parent_node_id |   Level 1   | Level 2 | Level 2  | Level3   | 
+---------------+-------------+---------+----------+----------+
|  P1           |    C1       | C11     |  C12     |  123     |
+---------------+-------------+---------+----------+----------+
|  P1           |    C1       | C11     |  C12     |  456     |
+---------------+-------------+---------+----------+----------+
|  P1           |    C2       | NULL    |  NULL    |  NULL    |
+---------------+-------------+---------+----------+----------+
|  P2           |    C3       | NULL    |  NULL    |  NULL    |
+---------------+-------------+---------+----------+----------+


+-------+----------+
| node  | balance  |
+-------+----------+
|  123  |  100     |
+-------+----------+
|  456  |  150     |
+-------+----------+
|  C12  |  250     |
+-------+----------+
|  C11  |  250     |
+-------+----------+
|  C1   |  250     |
+-------+----------+
|  P1   |  650     |
+-------+----------+
|  C2   |  200     |
+-------+----------+
|  P2   |  200     |
+-------+----------+

Upvotes: 0

Views: 2693

Answers (1)

Vijiy
Vijiy

Reputation: 1197

Since you know the maximum relationships that can be in hierarchy,

Below would be the query -

hive> select * from hier;
OK
parent child child_desc
C1      C11     Party
C1      C111    Placement
C111    C1111   XM
C12     C121    Brand
C2      C12     Equity
XN12    C1      Private
XN12    C2      Public
Time taken: 0.123 seconds, Fetched: 7 row(s)

Query -

with lvl1 as
(
select p1.parent, p1.child child1, p1.child_desc child_desc1, p2.child child2, p2.child_desc child_desc2
from hier p1 left outer join hier p2
on p1.child = p2.parent
)
, lvl2 as
(
select lvl1.parent, lvl1.child1, lvl1.child_desc1, lvl1.child2 child2, lvl1.child_desc2, p3.parent p2, 
p3.child as child3, p3.child_desc child_desc3
from lvl1 left outer join hier p3 on nvl(lvl1.child2,'A') = nvl(p3.parent,'A')
)
, lvl3 as
(
select lvl2.parent, lvl2.child1 child1, lvl2.child_desc1, lvl2.child2 child2, lvl2.child_desc2, 
lvl2.child3 hild3, lvl2.child_desc3, p3.child as child4, p3.child as child_desc4
from lvl2 left outer join hier p3 on nvl(lvl2.child3,'A') = nvl(p3.parent,'A')
)
select * from lvl3;

C1      C11     Party   NULL    NULL    NULL    NULL    NULL    NULL
C1      C111    Placement       C1111   XM      NULL    NULL    NULL    NULL
C111    C1111   XM      NULL    NULL    NULL    NULL    NULL    NULL
C12     C121    Brand   NULL    NULL    NULL    NULL    NULL    NULL
C2      C12     Equity  C121    Brand   NULL    NULL    NULL    NULL
XN12    C1      Private C11     Party   NULL    NULL    NULL    NULL
XN12    C1      Private C111    Placement       C1111   XM      NULL    NULL
XN12    C2      Public  C12     Equity  C121    Brand   NULL    NULL

Above query is for level-3, you can replicate it to level-15.

I have tested the above query for 3 levels.

hive> select * from hier;
OK
parent child
C1      C11
C11     C12
C12     123
P1      C1
P2      C2

o/p -

C1      C11     C12     123     NULL
C11     C12     123     NULL    NULL
C12     123     NULL    NULL    NULL
P1      C1      C11     C12     123
P2      C2      NULL    NULL    NULL

If you want the query for n-levels and you don't want to hard code the query, that would require some scripting (with good amount of work) to create the query or to iterate the same query multiple times.

Hope this helps

Upvotes: 1

Related Questions