ZeroMargin
ZeroMargin

Reputation: 31

Transform Hierarchy Tree into Parent/Child Table with Excel Formulas

I am trying to transform a hierarchy with 5 Levels into a Parent/Child table with 2 columns.

I need to do this with Excel formulas, not VBA script. Can I do this with Index and Match functions?

The Parent/Child Table should be dynamic and update automatically if Hierarchy changes.

A B C D E
1 H I E R A R C H Y
2 Universe
3 North America
4 USA
5 California
6 San Francisco
7 Los Angeles
8 Montana
9 Mexico
10 Canada
11 Europe
12 Italy
13 Spain
France

I was able to populate the Children with the array formula below, i.e. North America in B2:

{=INDEX(A3:E3,MATCH(FALSE,ISBLANK(A3:E3),0))}

However I am looking for a formula to populate the Parents.

Expected Parent / Child Table:

A B
1 Parent Child
2 Universe North America
3 North America USA
4 USA California
5 California San Francisco
6 California Los Angeles
7 USA Montana
8 North America Mexico
9 North America Canada
10 Universe Europe
11 Europe Italy
12 Europe Spain
13 Europe France

Upvotes: 3

Views: 4078

Answers (1)

Robin Mackenzie
Robin Mackenzie

Reputation: 19319

I think the INDEX and MATCH method that works for the children will not work for the parents because:

  • Your data structure implies that there is a single piece of information per row therefore {=INDEX(A3:E3,MATCH(FALSE,ISBLANK(A3:E3),0))} always works for a child value

  • But, the parent for that child value is the value in the prior column and an unknown number of rows above the row of the child. So, because the INDEX/MATCH approach does not return the location of the child in order that we can figure out the prior column that needs to be identified and addressed etc.

We can think about the parent value being in row of the column with the maximum index of a non-blank values up to and including that row.

E.g. for Spain the range to check for the parent is:

enter image description here

E.g. for Los Angeles the range to check for the parent is:

enter image description here

So to establish the correct range to check for the parent value for a child, you need to offset by -1 column and from the min row (2) to the max row (the row of the child).

To do this we need to get:

  1. The row/ column coordinates of the children (and therefore we can also know the child value)
  2. The min and max row of the prior column (the parent must exist in a non-blank value in that column)
  3. Resolve values via an INDIRECT(ADDRESS(...)) approach against row/ column indices

So it will end up like this:

enter image description here

Where:

Child addresses

  • C_row: =ROW(A2:E2) is simply the row of the hierarchy
  • C_col: =SUMPRODUCT(--NOT(ISBLANK(A2:E2)),COLUMN(A2:E2)) is a non-array formula version of your formula that returns the index of the non-blank cell instead of the value itself
  • C_add: =ADDRESS(G2,H2) is the address of the child value per C_row and C_col above

Parent addresses

  • P_col: =H2-1 is the prior column to the child column (C_col); which we know has the parent value
  • P_add_min: =IF(J2>0,ADDRESS(2,J2),"zzz") is the address of the minimum row of the range containing the parent, with a condition to identify the root of the hierarchy (i.e. Universe has no parent)
  • P_add_max: =IF(J2>0,ADDRESS(G2,J2),"zzz") is the address of the maximum row of the range containing the parent, with the same condition as P_add_min re root of the hierarchy
  • P_row: =IF(J2>0,AGGREGATE(14,4,(NOT(ISBLANK(INDIRECT(K2&":"&L2)))*ROW(INDIRECT(K2&":"&L2))),1),"zzz") says where the child is not the root of the hierarchy, get the LARGEst index of a non-blank value in the prior column to the child between the min and max rows established above

Expected output

  • Parent: =IF(J2>0,INDIRECT(ADDRESS(M2,J2)),"zzz") gets the value at the address given by P_row and P_col
  • Child: =INDIRECT(I2) gets the value at the address given by C_add

Extensibility

The method above accounts for a hierarchy of arbitrary number of components where the maximum depth of the hierarchy is 5 (e.g. leaves in column E).

If you want to have arbitrary depth as well then you need to separate the helper columns and output columns into a different sheet e.g.

  • C_row: =ROW(Sheet1!2:2)
  • C_col: =SUMPRODUCT(--NOT(ISBLANK(Sheet1!2:2)),COLUMN(Sheet1!2:2))
  • C_add: =ADDRESS(A2,B2,1,1,"Sheet1")
  • P_col: =B2-1
  • P_add_min: =IF(D2>0,ADDRESS(2,D2),"zzz")
  • P_add_max: =IF(D2>0,ADDRESS(A2,D2),"zzz")
  • P_row: =IF(D2>0,AGGREGATE(14,4,(NOT(ISBLANK(INDIRECT("Sheet1!"&E2&":"&F2)))*ROW(INDIRECT("Sheet1!"&E2&":"&F2))),1),"zzz")
  • Parent: =IF(D2>0,INDIRECT(ADDRESS(G2,D2,1,1,"Sheet1")),"zzz")
  • Child: =INDIRECT(C2)

And you can go ahead and rename zzz just as "" to tidy things up.

HTH

Upvotes: 1

Related Questions