matan
matan

Reputation: 461

How to get chain of all connected edges on DAG (pandas\networkx)

i have DAG that i'am convert to pandas_DF

the DF is:

df=pd.DataFrame({'dad':[1, 2, 3, 4,5, "T1", "T2"],
          'children':["T1","T1","T2","T2",6,"T3","T3"]})
print (df)

i want to get list of all the nodes\edges that connected in my DAG (graph) so it will look like this

df=pd.DataFrame({'dad':[1, 2, 3, 4,5, "T1", "T2","T3"],
          'children':["T1","T1","T2","T2",6,"T3","T3","X"],
           'chain':[0,0,0,0,0,[1,2],[3,4],[1,2,3,4,"T1","T2"]] })

i like to know the connection between the edges all over the chain, like the new column "chain" . its can be a new column like here ,and the order is not important too

i use pandas and networkx, but i will be happy to know a new library of DAG like networkx for python.

The graph looks like it has 2 trees inside enter image description here

Upvotes: 2

Views: 728

Answers (1)

Scott Boston
Scott Boston

Reputation: 153550

You can use networkx as @QuangHoang suggests like this:

import pandas as pd
import networkx as nx

df=pd.DataFrame({'dad':[1, 2, 3, 4,5, "T1", "T2"],
          'children':["T1","T1","T2","T2",6,"T3","T3"]})
G = nx.from_pandas_edgelist(df, 'dad','children', create_using=nx.DiGraph())
df['chain'] = df['dad'].transform(lambda x: list(G.predecessors(x)))
df

Output:

  dad children   chain
0   1       T1      []
1   2       T1      []
2   3       T2      []
3   4       T2      []
4   5        6      []
5  T1       T3  [1, 2]
6  T2       T3  [3, 4]

I think you need all the components of the DiGraph... here is a way to generate those subgraphs with chains.

import pandas as  pd
import networkx as nx

df=pd.DataFrame({'dad':[1, 2, 3, 4,5, "T1", "T2"],
          'children':["T1","T1","T2","T2",6,"T3","T3"]})
G = nx.from_pandas_edgelist(df, 'dad','children', create_using=nx.DiGraph())
df['chain'] = df['dad'].transform(lambda x: list(G.predecessors(x)))


w_list = list(nx.weakly_connected_components(G))
df_comp = pd.DataFrame({'dad': [list(n)[-1] for n in w_list], 
              'children':['X' for _ in w_list], 
              'chain': [list(x) for x in w_list]})

df_out = pd.concat([df, df_comp])
df_out

Output:

  dad children                     chain
0   1       T1                        []
1   2       T1                        []
2   3       T2                        []
3   4       T2                        []
4   5        6                        []
5  T1       T3                    [1, 2]
6  T2       T3                    [3, 4]
0  T3        X  [1, 2, 3, 4, T1, T2, T3]
1   6        X                    [5, 6]

Upvotes: 4

Related Questions