emax
emax

Reputation: 7245

Python: how to create a graph with networkx with correspondence with another dataframe?

I have two dataframe df and df1. df contains the information of some nodes. So

df  Name       Age
0   Jack       33
1   Anna       25
2   Emilie     49
3   Frank      19
4   John       42

while df1 contains the info of the number of contacts between two people. In df1 we can have some people that don't appear in df

df1    Name1    Name2   c
0      Frank    Paul    2
1      Julia    Anna    5
2      Frank    John    1
3      Emilie   Jack    3
4      Tom      Steven  2
5      Tom      Jack    5

I would like to create an adjaceny matrix with the nodes in df and the information between the connections in df1.

In order to create the adjacency matrix from df1, I did the following:

import networkx as nx
G = nx.Graph()
G = nx.from.pandas_dataframe(df1, 'Name1', 'Name2', ['c'])
adj = nx.adjacency_matrix(G)

However, in this way there is not a direct corespondency with df. In fact I would like to generate a 6x6 adjacency matrix where, the row 0 and the column 0 correspond to Jack. The row 1 and the column 1 correspond to Anna and so on.

Upvotes: 4

Views: 908

Answers (2)

DYZ
DYZ

Reputation: 57033

The adjacency matrix returned by NetworkX is sparse. First, convert it to a dense matrix:

dense = nx.adjacency_matrix(G).todense()

Create a dataframe whose content is the adjacency matrix and rows and columns represent all nodes:

adj_df = pd.DataFrame(dense, index=G.nodes(), columns=G.nodes())

Finally, take the subset of the dataframe, as defined by df:

adj_df.loc[df.Name, df.Name]
#        Jack  Anna  Emilie  Frank  John
#Jack       0     0       1      0     0
#Anna       0     0       0      0     0
#Emilie     1     0       0      0     0
#Frank      0     0       0      0     1
#John       0     0       0      1     0

Upvotes: 2

CtheSky
CtheSky

Reputation: 2624

You can construct a digraph by adding nodes and edges manually:

def from_pandas_dataframe(df, col_from, col_to, col_weight=None, nodes=None):
    """Construct a digraph from dataframe.

    :param df: dataframe contains edge/relation information
    :param col_from: dataframe column name for start of edge
    :param col_to: dataframe column name for end of edge
    :param col_weight: dataframe column name for col_weight, defaults 1 if not provided
    :param nodes: nodes for the graph, default to use nodes from df if not provided
    :return:
    """
    g = nx.OrderedDiGraph()

    # add nodes
    if not nodes:
        nodes = set(df[col_from]) | set(df[col_to])
    g.add_nodes_from(nodes)

    # add edges
    for _, row in df.iterrows():
        from_node, to_node = row[col_from], row[col_to]
        if from_node in nodes and to_node in nodes:
            weight = 1 if not col_weight else row[col_weight]
            g.add_edge(from_node, to_node, weight=weight)

    return g

Param nodes specify the nodes in graph and edge with node not in it will be omitted:

g = from_pandas_dataframe(df1, 'Name1', 'Name2', col_weight='c', nodes=df['Name'].tolist())
adj = nx.adjacency_matrix(g)

Running on the sample data:

>>> print(g.nodes)
['Jack', 'Anna', 'Emilie', 'Frank', 'John']
>>> print(adj)
  (2, 0)    3
  (3, 4)    1

Upvotes: 0

Related Questions