Reputation: 53
I want dynamically flatten a parent child hierarchy pandas dataframe.
Notes:
Input example:
import pandas as pd
import numpy as np
pd.options.display.max_columns = None
pd.options.display.max_rows = None
pd.options.display.expand_frame_repr = False
pd.options.mode.chained_assignment = None
df = pd.DataFrame(
{
"child": ["xyz", "opr", "axz", "asd", "asd", "zxc", "zxc", "zxc"],
"parent": [np.nan, "xyz", "xyz", "opr", "opr", "opr", "axz", "xyz"],
}
)
print(df)
Expected output:
level_0 level_1 leaf
0 xyz opr asd
1 xyz opr asd
2 xyz opr zxc
3 xyz axz zxc
4 xyz NaN zxc
Upvotes: 1
Views: 2679
Reputation: 148870
The leafs are the elements from the child
column that do not exist in parent
one.
Once this is done, I would iterate adding a new parent column on every pass until all parents are NaN. And another trick is required to ensure that the last level will contain the ultimate grand_parents: if a parent column contains a NaN, the values have to be switched with the previous column. Code:
result = df[~df['child'].isin(df['parent'])]
result.columns = ['leaf', 'lev_1']
ix = 1
while True:
result = result.merge(df, 'left', left_on=f'lev_{ix}', right_on='child'
).drop(columns='child')
if (result['parent'].isna().all()):
result = result.drop(columns='parent')
break
result.loc[result['parent'].isna(), f'lev_{ix}':'parent'
] = result[result['parent'].isna()][['parent', f'lev_{ix}']
].values
print(result)
ix += 1
result = result.rename(columns={'parent': f'lev_{ix}'})
# rename and reorder columns to match your expected result
result = result.rename(columns={f'lev_{ix-i}': f'lev_{i}' for i in range(ix)}
).reindex(columns=[f'lev_{i}' for i in range(ix)]
+ ['leaf'])
It gives as expected:
lev_0 lev_1 leaf
0 grand parent parent 1 child 1
1 grand parent parent 1 child 1
2 grand parent parent 1 child 2
3 grand parent parent 2 child 2
4 grand parent NaN child 2
and should accept an arbitrary number of levels
Upvotes: 1