Get NetworkX 2D position of different group of nodes as an additional Pandas DataFrame column

Question

I have a DataFrame constructed as an edge list and multiple child node and edge meta data columns with around 10 000 entries.

 Child    |    Parent    |   ChildCategory  |  ChildDescription |  EdgeType | Root |
  C1           'root'             X              Lorem Ipsum        Strong     C1
  C2             C1               X              Lorem Ipsum        Strong     C1
  C3             C2               Y              Lorem Ipsum        Strong     C1
  C4             C2               Y              Lorem Ipsum        Strong     C1
  C5           'root'             X              Lorem Ipsum        Strong     C5
  C6             C5               X              Lorem Ipsum        Strong     C5
  C7             C6               Y              Lorem Ipsum         Weak      C5
  ...           ...              ...                 ...              ..       ..

Using networkx I can transform the dataframe to a graph.

  G = nx.from_pandas_edgelist(df,source="Parent",target= "Child",edge_attr=["EdgeType"],create_using = nx.MultiDiGraph())
  node_meta_data = ["ChildCategory","ChildDescription","Root"]
  for col in node_meta_data:
      nx.set_node_attributes(G,dict(zip(node_list_df,df[col].fillna('').tolist())),col)

What I want to do now is to get the 2D position of each node per group of Root column and get it back to a DataFrame column so I can visualize the nodes in another program.

If I would do it on the entire graph, I can do it like this.

df = pd.DataFrame(index=G.nodes())
for col in node_meta_data:
     df[col] = pd.Series(nx.get_node_attributes(G, col))
df['EdgeType'] = nx.get_edge_attributes(G,'EdgeType')


### Here is the problem. 
df['position'] = pd.Series(nx.kamada_kawai_layout(G))) ##Without group by root.

#### But I need position per group of root.
....

But how would I go about doing this per group of root, would it be possible to use pandas group_by together with G.subgraph() in a smart way?

EDIT: The position column should reflect the position of the child column.

yatu · Accepted Answer

It looks like you want a different subgraph starting from each root node. For that you need to change the name of each root node, since they must be distinguished. One way could be:

is_root = df.Parent.eq("'root'")
df.loc[is_root, 'Parent'] += is_root.cumsum().astype(str)

Which will give:

print(node_list_df)

  Child   Parent ChildCategory ChildDescription EdgeType Root
0    C1  'root'1             X       LoremIpsum   Strong   C1
1    C2       C1             X       LoremIpsum   Strong   C1
2    C3       C2             Y       LoremIpsum   Strong   C1
3    C4       C2             Y       LoremIpsum   Strong   C1
4    C5  'root'2             X       LoremIpsum   Strong   C5
5    C6       C5             X       LoremIpsum   Strong   C5
6    C7       C6             Y       LoremIpsum     Weak   C5

Now if we construct the graph from the modified dataframe, we'd now get two different subgraphs, for the successors stemming from each root node:

G = nx.from_pandas_edgelist(node_list_df,source="Parent",
                          target= "Child",
                          create_using = nx.DiGraph())

pos = nx.kamada_kawai_layout(G)

nx.draw(G, pos=pos, 
        node_color='lightblue', 
        with_labels=True,
        node_size=500)

We can now update the dataframe with the positions from the layout with:

pos = (pd.DataFrame(pos, index=['x', 'y']).T
         .rename_axis('Parent')
         .reset_index())
df_out = node_list_df.merge(pos, on='Parent', sort=False)

print(df_out)

  Child   Parent ChildCategory ChildDescription EdgeType Root         x  \
0    C1  'root'1             X       LoremIpsum   Strong   C1  1.000000   
1    C2       C1             X       LoremIpsum   Strong   C1  0.467196   
2    C3       C2             Y       LoremIpsum   Strong   C1 -0.055515   
3    C4       C2             Y       LoremIpsum   Strong   C1 -0.055515   
4    C5  'root'2             X       LoremIpsum   Strong   C5 -0.883338   
5    C6       C5             X       LoremIpsum   Strong   C5 -0.345431   
6    C7       C6             Y       LoremIpsum     Weak   C5  0.200324   

          y  
0 -0.002704  
1  0.149699  
2  0.333853  
3  0.333853  
4 -0.230175  
5 -0.363323  
6 -0.459552

Get NetworkX 2D position of different group of nodes as an additional Pandas DataFrame column

Answers (1)

Related Questions