Reputation: 1040
I have a DataFrame constructed as an edge list and multiple child node and edge meta data columns with around 10 000 entries.
Child | Parent | ChildCategory | ChildDescription | EdgeType | Root |
C1 'root' X Lorem Ipsum Strong C1
C2 C1 X Lorem Ipsum Strong C1
C3 C2 Y Lorem Ipsum Strong C1
C4 C2 Y Lorem Ipsum Strong C1
C5 'root' X Lorem Ipsum Strong C5
C6 C5 X Lorem Ipsum Strong C5
C7 C6 Y Lorem Ipsum Weak C5
... ... ... ... .. ..
Using networkx I can transform the dataframe to a graph.
G = nx.from_pandas_edgelist(df,source="Parent",target= "Child",edge_attr=["EdgeType"],create_using = nx.MultiDiGraph())
node_meta_data = ["ChildCategory","ChildDescription","Root"]
for col in node_meta_data:
nx.set_node_attributes(G,dict(zip(node_list_df,df[col].fillna('').tolist())),col)
What I want to do now is to get the 2D position of each node per group of Root
column and get it back to a DataFrame column so I can visualize the nodes in another program.
If I would do it on the entire graph, I can do it like this.
df = pd.DataFrame(index=G.nodes())
for col in node_meta_data:
df[col] = pd.Series(nx.get_node_attributes(G, col))
df['EdgeType'] = nx.get_edge_attributes(G,'EdgeType')
### Here is the problem.
df['position'] = pd.Series(nx.kamada_kawai_layout(G))) ##Without group by root.
#### But I need position per group of root.
....
But how would I go about doing this per group of root, would it be possible to use pandas group_by
together with G.subgraph()
in a smart way?
EDIT: The position column should reflect the position of the child column.
Upvotes: 0
Views: 342
Reputation: 88236
It looks like you want a different subgraph starting from each root
node. For that you need to change the name of each root
node, since they must be distinguished. One way could be:
is_root = df.Parent.eq("'root'")
df.loc[is_root, 'Parent'] += is_root.cumsum().astype(str)
Which will give:
print(node_list_df)
Child Parent ChildCategory ChildDescription EdgeType Root
0 C1 'root'1 X LoremIpsum Strong C1
1 C2 C1 X LoremIpsum Strong C1
2 C3 C2 Y LoremIpsum Strong C1
3 C4 C2 Y LoremIpsum Strong C1
4 C5 'root'2 X LoremIpsum Strong C5
5 C6 C5 X LoremIpsum Strong C5
6 C7 C6 Y LoremIpsum Weak C5
Now if we construct the graph from the modified dataframe, we'd now get two different subgraphs, for the successors stemming from each root
node:
G = nx.from_pandas_edgelist(node_list_df,source="Parent",
target= "Child",
create_using = nx.DiGraph())
pos = nx.kamada_kawai_layout(G)
nx.draw(G, pos=pos,
node_color='lightblue',
with_labels=True,
node_size=500)
We can now update the dataframe with the positions from the layout with:
pos = (pd.DataFrame(pos, index=['x', 'y']).T
.rename_axis('Parent')
.reset_index())
df_out = node_list_df.merge(pos, on='Parent', sort=False)
print(df_out)
Child Parent ChildCategory ChildDescription EdgeType Root x \
0 C1 'root'1 X LoremIpsum Strong C1 1.000000
1 C2 C1 X LoremIpsum Strong C1 0.467196
2 C3 C2 Y LoremIpsum Strong C1 -0.055515
3 C4 C2 Y LoremIpsum Strong C1 -0.055515
4 C5 'root'2 X LoremIpsum Strong C5 -0.883338
5 C6 C5 X LoremIpsum Strong C5 -0.345431
6 C7 C6 Y LoremIpsum Weak C5 0.200324
y
0 -0.002704
1 0.149699
2 0.333853
3 0.333853
4 -0.230175
5 -0.363323
6 -0.459552
Upvotes: 1