Reputation: 2510
I have code that aims to generate a graph from an adjacency matrix from a table correlating workers with their manager. The source is a table with two columns (Worker, manager). It still works perfectly from a small mock data set, but fails unexpectedly with the real data:
import pandas as pd
import networkx as nx
# Read input
df = pd.read_csv("org.csv")
# Create the input adjacency matrix
am = pd.DataFrame(0, columns=df["Worker"], index=df["Worker"])
# This way, it is impossible that the dataframe is not square,
# or that index and columns don't match
# Fill the matrix
for ix, row in df.iterrows():
am.at[row["manager"], row["Worker"]] = 1
# At this point, am.shape returns a square dataframe (2825,2825)
# Generate the graph
G = nx.from_pandas_adjacency(am, create_using=nx.DiGraph)
This returns: NetworkXError: Adjacency matrix not square: nx,ny=(2825, 2829)
And indeed, the dimensions reported in the error are not the same as in those of the input dataframe am
.
Does anyone have an idea of what happens in from_pandas_adjacency
that could lead to this mismatch?
Upvotes: 2
Views: 41
Reputation: 102419
First of all, your "adjacency matrix" is not the real one, but the "incidence matrix" indeed.
I didn't find a straightforward utility in networkx that support generating the directed graph from the incidence matrix. However, with igraph package in the r environment, there is such functionality that can show how it should work. For example
library(igraph)
df <- data.frame(
manager = c("A", "B", "A"),
worker = c("D", "E", "F")
)
am <- table(df)
g <- graph_from_biadjacency_matrix(am, directed = TRUE, mode = "out")
plot(g)
where
> print(df)
manager worker
1 A D
2 B E
3 A F
> print(am)
worker
manager D E F
A 1 0 1
B 0 1 0
such that g
can be visualized as below
Again, the real "adjacency matrix" should look like this
> as_adjacency_matrix(g)
5 x 5 sparse Matrix of class "dgCMatrix"
A B D E F
A . . 1 . 1
B . . . 1 .
D . . . . .
E . . . . .
F . . . . .
Upvotes: 0
Reputation: 262114
In:
am = pd.DataFrame(0, columns=df["Worker"], index=df["Worker"])
# This way, it is impossible that the dataframe is not square,
your DataFrame is indeed square, but when you later assign values in the loop, if you have a manager that is not in "Worker", this will create a new row:
am.at[row["manager"], row["Worker"]]
Better avoid the loop, use a crosstab
, then reindex
on the whole set of nodes:
am = pd.crosstab(df['manager'], df['Worker'])
nodes = am.index.union(am.columns)
am = am.reindex(index=nodes, columns=nodes, fill_value=0)
Even better, if you don't really need the adjacency matrix, directly create the graph with nx.from_pandas_edgelist
:
G = nx.from_pandas_edgelist(df, source='manager', target='Worker',
create_using=nx.DiGraph)
Example:
# input
df = pd.DataFrame({'manager': ['A', 'B', 'A'], 'Worker': ['D', 'E', 'F']})
# adjacency matrix
A B D E F
A 0 0 1 0 1
B 0 0 0 1 0
D 0 0 0 0 0
E 0 0 0 0 0
F 0 0 0 0 0
# adjacency matrix with your code
Worker D E F
Worker
D 0.0 0.0 0.0
E 0.0 0.0 0.0
F 0.0 0.0 0.0
A 1.0 NaN 1.0 # those rows are created
B NaN 1.0 NaN # after initializing am
Graph:
Upvotes: 1