Reputation: 799
I have a tree of categories represented by the following.
import pandas as pd
asset_tree = [
{'id': 1, 'name': 'Linear Asset', 'parent_id': -1},
{'id': 2, 'name': 'Lateral', 'parent_id': 1},
{'id': 3, 'name': 'Main', 'parent_id': 1},
{'id': 4, 'name': 'Point Asset', 'parent_id': -1},
{'id': 5, 'name': 'Fountain', 'parent_id': 4},
{'id': 6, 'name': 'Hydrant', 'parent_id': 4}
]
tree = pd.DataFrame(asset_tree)
print(tree)
This gives me a dataframe as follows:
id name parent_id
0 1 Linear Asset -1
1 2 Lateral 1
2 3 Main 1
3 4 Point Asset -1
4 5 Fountain 4
5 6 Hydrant 4
The highest nodes in the tree have parent_id equal to -1, and so the tree can graphically be represented as follows:
Linear Asset
| - Lateral
| - Main
Point Asset
| - Fountain
| - Hydrant
I need to generate the following dataframe.
id name parent_id flat_name
0 1 Linear Asset -1 Linear Asset
1 2 Lateral 1 Linear Asset : Lateral
2 3 Main 1 Linear Asset : Main
3 4 Point Asset -1 Point Asset
4 5 Fountain 4 Point Asset : Fountain
5 6 Hydrant 4 Point Asset : Hydrant
The tree is generated dynamically and can have any number of levels and so the following tree
asset_tree = [
{'id': 1, 'name': 'Linear Asset', 'parent_id': -1},
{'id': 2, 'name': 'Lateral', 'parent_id': 1},
{'id': 3, 'name': 'Main', 'parent_id': 1},
{'id': 4, 'name': 'Point Asset', 'parent_id': -1},
{'id': 5, 'name': 'Fountain', 'parent_id': 4},
{'id': 6, 'name': 'Hydrant', 'parent_id': 4},
{'id': 7, 'name': 'Steel', 'parent_id': 2},
{'id': 8, 'name': 'Plastic', 'parent_id': 2},
{'id': 9, 'name': 'Steel', 'parent_id': 3},
{'id': 10, 'name': 'Plastic', 'parent_id': 3}
]
should result in the following:
id name parent_id flat_name
0 1 Linear Asset -1 Linear Asset
1 2 Lateral 1 Linear Asset : Lateral
2 3 Main 1 Linear Asset : Main
3 4 Point Asset -1 Point Asset
4 5 Fountain 4 Point Asset : Fountain
5 6 Hydrant 4 Point Asset : Hydrant
6 7 Steel 2 Linear Asset : Lateral : Steel
7 8 Plastic 2 Linear Asset : Lateral : Plastic
8 9 Steel 3 Linear Asset : Main : Steel
9 10 Plastic 3 Linear Asset : Main : Plastic
Upvotes: 3
Views: 1012
Reputation: 8790
Here is a recursive apply
function for accomplishing this. The function takes in an id
and returns its "path" through the tree:
def flatname(ID):
row = df[df['id'] == ID].squeeze()
if row['parent_id'] == -1:
return row['name']
else:
return flatname(row['parent_id']) + ' : ' + row['name']
To use, call:
df['flat_name'] = df['id'].apply(flatname)
The df
after used on your second example:
id name parent_id flat_name
0 1 Linear Asset -1 Linear Asset
1 2 Lateral 1 Linear Asset : Lateral
2 3 Main 1 Linear Asset : Main
3 4 Point Asset -1 Point Asset
4 5 Fountain 4 Point Asset : Fountain
5 6 Hydrant 4 Point Asset : Hydrant
6 7 Steel 2 Linear Asset : Lateral : Steel
7 8 Plastic 2 Linear Asset : Lateral : Plastic
8 9 Steel 3 Linear Asset : Main : Steel
9 10 Plastic 3 Linear Asset : Main : Plastic
OP noted that the above function refers explicitly to the df
variable defined outside of the function-scope. So if you call your DataFrame something different, or you want to call this on many DataFrames, this could cause problems. One fix would be to turn the apply
function into more of a private helper, and create an external (more user-friendly) function which calls it:
def _flatname_recurse(ID, df):
row = df[df['id'] == ID].squeeze()
if row['parent_id'] == -1:
return row['name']
else:
return _flatname_recurse(row['parent_id'], df=df) + ' : ' + row['name']
# asset_df to specify we are looking for a specific kind of df
def flatnames(asset_df):
return asset_df['id'].apply(_flatname_recurse, df=asset_df)
Then call with:
df['flat_name'] = flatnames(df)
Also, note that I used to use row = df.iloc[ID - 1, :]
for identifying the row, which worked in this case but was dependent on the id
being one greater than the index. This approach is more general.
Upvotes: 5
Reputation: 71451
You can use recursion to find the path to the parent id:
import pandas as pd
asset_tree = [{'id': 1, 'name': 'Linear Asset', 'parent_id': -1}, {'id': 2, 'name': 'Lateral', 'parent_id': 1}, {'id': 3, 'name': 'Main', 'parent_id': 1}, {'id': 4, 'name': 'Point Asset', 'parent_id': -1}, {'id': 5, 'name': 'Fountain', 'parent_id': 4}, {'id': 6, 'name': 'Hydrant', 'parent_id': 4}]
a_tree = {i['id']:i for i in asset_tree} #to dictionary for more efficient lookup
def get_parent(d, c = []):
if (k:=a_tree.get(d['parent_id'])) is None:
return c + [d['name']]
return get_parent(k, c+[d['name']])
r = [{**i, 'flat_name':' : '.join(get_parent(i)[::-1])} for i in asset_tree]
df = pd.DataFrame(r)
Output:
id name parent_id flat_name
0 1 Linear Asset -1 Linear Asset
1 2 Lateral 1 Linear Asset : Lateral
2 3 Main 1 Linear Asset : Main
3 4 Point Asset -1 Point Asset
4 5 Fountain 4 Point Asset : Fountain
5 6 Hydrant 4 Point Asset : Hydrant
On your larger asset_tree
:
asset_tree = [{'id': 1, 'name': 'Linear Asset', 'parent_id': -1}, {'id': 2, 'name': 'Lateral', 'parent_id': 1}, {'id': 3, 'name': 'Main', 'parent_id': 1}, {'id': 4, 'name': 'Point Asset', 'parent_id': -1}, {'id': 5, 'name': 'Fountain', 'parent_id': 4}, {'id': 6, 'name': 'Hydrant', 'parent_id': 4}, {'id': 7, 'name': 'Steel', 'parent_id': 2}, {'id': 8, 'name': 'Plastic', 'parent_id': 2}, {'id': 9, 'name': 'Steel', 'parent_id': 3}, {'id': 10, 'name': 'Plastic', 'parent_id': 3}]
a_tree = {i['id']:i for i in asset_tree}
r = [{**i, 'flat_name':' : '.join(get_parent(i)[::-1])} for i in asset_tree]
df = pd.DataFrame(r)
Output:
id name parent_id flat_name
0 1 Linear Asset -1 Linear Asset
1 2 Lateral 1 Linear Asset : Lateral
2 3 Main 1 Linear Asset : Main
3 4 Point Asset -1 Point Asset
4 5 Fountain 4 Point Asset : Fountain
5 6 Hydrant 4 Point Asset : Hydrant
6 7 Steel 2 Linear Asset : Lateral : Steel
7 8 Plastic 2 Linear Asset : Lateral : Plastic
8 9 Steel 3 Linear Asset : Main : Steel
9 10 Plastic 3 Linear Asset : Main : Plastic
Upvotes: 2
Reputation: 150725
This is a network problem, try networkx
:
import networkx as nx
# build the graph
G = nx.from_pandas_edgelist(tree, source='parent_id', target='id',
create_using=nx.DiGraph)
# map id to name
node_names = tree.set_index('id')['name'].to_dict()
# get path from root (-1) to the node
def get_path(node):
# this is a tree, so exactly one simple path for each node
for path in nx.simple_paths.all_simple_paths(G, -1, node):
return ' : '.join(node_names.get(i) for i in path[1:])
tree['flat_name'] = tree['id'].apply(get_path)
Output:
id name parent_id flat_name
0 1 Linear Asset -1 Linear Asset
1 2 Lateral 1 Linear Asset : Lateral
2 3 Main 1 Linear Asset : Main
3 4 Point Asset -1 Point Asset
4 5 Fountain 4 Point Asset : Fountain
5 6 Hydrant 4 Point Asset : Hydrant
6 7 Steel 2 Linear Asset : Lateral : Steel
7 8 Plastic 2 Linear Asset : Lateral : Plastic
8 9 Steel 3 Linear Asset : Main : Steel
9 10 Plastic 3 Linear Asset : Main : Plastic
Upvotes: 3