Reputation: 7245
I have two dataframe df
and df1
. df
contains the information of some nodes. So
df Name Age
0 Jack 33
1 Anna 25
2 Emilie 49
3 Frank 19
4 John 42
while df1
contains the info of the number of contacts between two people. In df1
we can have some people that don't appear in df
df1 Name1 Name2 c
0 Frank Paul 2
1 Julia Anna 5
2 Frank John 1
3 Emilie Jack 3
4 Tom Steven 2
5 Tom Jack 5
I would like to create an adjaceny matrix with the nodes in df
and the information between the connections in df1
.
In order to create the adjacency matrix from df1
, I did the following:
import networkx as nx
G = nx.Graph()
G = nx.from.pandas_dataframe(df1, 'Name1', 'Name2', ['c'])
adj = nx.adjacency_matrix(G)
However, in this way there is not a direct corespondency with df
. In fact I would like to generate a 6x6
adjacency matrix where, the row 0
and the column 0
correspond to Jack
. The row 1
and the column 1
correspond to Anna
and so on.
Upvotes: 4
Views: 908
Reputation: 57033
The adjacency matrix returned by NetworkX is sparse. First, convert it to a dense matrix:
dense = nx.adjacency_matrix(G).todense()
Create a dataframe whose content is the adjacency matrix and rows and columns represent all nodes:
adj_df = pd.DataFrame(dense, index=G.nodes(), columns=G.nodes())
Finally, take the subset of the dataframe, as defined by df
:
adj_df.loc[df.Name, df.Name]
# Jack Anna Emilie Frank John
#Jack 0 0 1 0 0
#Anna 0 0 0 0 0
#Emilie 1 0 0 0 0
#Frank 0 0 0 0 1
#John 0 0 0 1 0
Upvotes: 2
Reputation: 2624
You can construct a digraph by adding nodes and edges manually:
def from_pandas_dataframe(df, col_from, col_to, col_weight=None, nodes=None):
"""Construct a digraph from dataframe.
:param df: dataframe contains edge/relation information
:param col_from: dataframe column name for start of edge
:param col_to: dataframe column name for end of edge
:param col_weight: dataframe column name for col_weight, defaults 1 if not provided
:param nodes: nodes for the graph, default to use nodes from df if not provided
:return:
"""
g = nx.OrderedDiGraph()
# add nodes
if not nodes:
nodes = set(df[col_from]) | set(df[col_to])
g.add_nodes_from(nodes)
# add edges
for _, row in df.iterrows():
from_node, to_node = row[col_from], row[col_to]
if from_node in nodes and to_node in nodes:
weight = 1 if not col_weight else row[col_weight]
g.add_edge(from_node, to_node, weight=weight)
return g
Param nodes
specify the nodes in graph and edge with node not in it will be omitted:
g = from_pandas_dataframe(df1, 'Name1', 'Name2', col_weight='c', nodes=df['Name'].tolist())
adj = nx.adjacency_matrix(g)
Running on the sample data:
>>> print(g.nodes)
['Jack', 'Anna', 'Emilie', 'Frank', 'John']
>>> print(adj)
(2, 0) 3
(3, 4) 1
Upvotes: 0