mrgou
mrgou

Reputation: 2510

Adjacency matrix not square error from square dataframe with networkx

I have code that aims to generate a graph from an adjacency matrix from a table correlating workers with their manager. The source is a table with two columns (Worker, manager). It still works perfectly from a small mock data set, but fails unexpectedly with the real data:

import pandas as pd
import networkx as nx

# Read input
df = pd.read_csv("org.csv")

# Create the input adjacency matrix
am = pd.DataFrame(0, columns=df["Worker"], index=df["Worker"])
# This way, it is impossible that the dataframe is not square,
# or that index and columns don't match

# Fill the matrix
for ix, row in df.iterrows():
    am.at[row["manager"], row["Worker"]] = 1

# At this point, am.shape returns a square dataframe (2825,2825)
# Generate the graph
G = nx.from_pandas_adjacency(am, create_using=nx.DiGraph)

This returns: NetworkXError: Adjacency matrix not square: nx,ny=(2825, 2829)

And indeed, the dimensions reported in the error are not the same as in those of the input dataframe am.

Does anyone have an idea of what happens in from_pandas_adjacency that could lead to this mismatch?

Upvotes: 2

Views: 41

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 102419

First of all, your "adjacency matrix" is not the real one, but the "incidence matrix" indeed.

I didn't find a straightforward utility in that support generating the directed graph from the incidence matrix. However, with package in the environment, there is such functionality that can show how it should work. For example

library(igraph)

df <- data.frame(
    manager = c("A", "B", "A"),
    worker = c("D", "E", "F")
)
am <- table(df)
g <- graph_from_biadjacency_matrix(am, directed = TRUE, mode = "out")
plot(g)

where

> print(df)
  manager worker
1       A      D
2       B      E
3       A      F

> print(am)
       worker
manager D E F
      A 1 0 1
      B 0 1 0

such that g can be visualized as below

enter image description here

Again, the real "adjacency matrix" should look like this

> as_adjacency_matrix(g)
5 x 5 sparse Matrix of class "dgCMatrix"
  A B D E F
A . . 1 . 1
B . . . 1 .
D . . . . .
E . . . . .
F . . . . .

Upvotes: 0

mozway
mozway

Reputation: 262114

In:

am = pd.DataFrame(0, columns=df["Worker"], index=df["Worker"])
# This way, it is impossible that the dataframe is not square,

your DataFrame is indeed square, but when you later assign values in the loop, if you have a manager that is not in "Worker", this will create a new row:

am.at[row["manager"], row["Worker"]]

Better avoid the loop, use a crosstab, then reindex on the whole set of nodes:

am = pd.crosstab(df['manager'], df['Worker'])
nodes = am.index.union(am.columns)
am = am.reindex(index=nodes, columns=nodes, fill_value=0)

Even better, if you don't really need the adjacency matrix, directly create the graph with nx.from_pandas_edgelist:

G = nx.from_pandas_edgelist(df, source='manager', target='Worker',
                            create_using=nx.DiGraph)

Example:

# input
df = pd.DataFrame({'manager': ['A', 'B', 'A'], 'Worker': ['D', 'E', 'F']})

# adjacency matrix
   A  B  D  E  F
A  0  0  1  0  1
B  0  0  0  1  0
D  0  0  0  0  0
E  0  0  0  0  0
F  0  0  0  0  0

# adjacency matrix with your code
Worker    D    E    F
Worker               
D       0.0  0.0  0.0
E       0.0  0.0  0.0
F       0.0  0.0  0.0
A       1.0  NaN  1.0  # those rows are created 
B       NaN  1.0  NaN  # after initializing am

Graph:

enter image description here

Upvotes: 1

Related Questions