Reputation: 416

Importing non-square adjacency matrix into Networkx python

I have some data in pandas dataframe form below, where the columns represent discrete skills and the rows represent discrete jobs. A 1 is present only if the skill is required by the job, otherwise 0.

     skill_1, skill_2,
job_1      1,       0,       
job_2      0,       0,       
job_3      1,       1,

I want to create a graph to visualize this relationship between jobs and skills, using networkx. I've tried two methods, one on the dataframe, itself, nx.from_pandas_adjacency and nx.from_numpy_matrix. The latter method was applied to a numpy representation of the dataframe, where the column and row names were removed.

In either situation, an error was raised because this is a non_square matrix. This makes sense as networkx is likely interpreting both columns and rows as the same set of nodes. However, the columns and nodes represent distinctly different things here. Two jobs are connected by the skill(s) they share and two skills are connected by the job(s) they share, but there is no direct edge between any two skills or any two jobs.

How can I import my data into networkx given that my rows and columns are different sets of nodes?

Upvotes: 5

Answers (3)

cjmaria

Reputation: 340

As mentioned by ComplexGates, what you have here is a biadjacency matrix. I see that you've added a solution where you fill in the rest of the matrix with zeros to make it square. However, I suspect what you were really wanted was how to convert a biadjacency matrix into a (square) adjacency matrix, which is different from the posted solution.

For a biadjacency matrix A with m rows and n columns, you can convert it into an adjacency matrix of size (m+n)x(m+n) like so:

┏           ┓
┃0_nxn A^T  ┃
┃A_mxn 0_mxm┃
┗           ┛

In other words, put A at the bottom left of the (m+n)x(m+n) matrix, and the transpose of A at the top right, and fill the remaining space with zeros.

In code, if A is a 2D Numpy array, you might do something like:

def bipartite_to_adjacency(A):
     m, n = A.shape
     Z_mm = np.zeros((m,m), dtype=int)
     Z_nn = np.zeros((n,n), dtype=int)
     top_partition = np.concatenate((Z_nn,np.transpose(A)), axis=1)
     bottom_partition = np.concatenate((A,Z_mm), axis=1)
     return np.concatenate((top_partition, bottom_partition))

Upvotes: 0

ComplexGates

Reputation: 743

You have a bipartite graph. Networkx can create this network from your original (bi)adjacency matrix using nx.algorithms.bipartite.matrix.from_biadjacency_matrix

Upvotes: 2

CDJB

Reputation: 14506

One option is to generate the missing rows and columns

(I was curious about a vectorised method to achieve this, so I asked this question which has answers which provide such a method.)

df = pd.DataFrame({'skill_1': {'job_1': 1, 'job_2': 0, 'job_3': 1},
 'skill_2': {'job_1': 0, 'job_2': 0, 'job_3': 1}})

edges = df.columns

for i in df.index:
    df[i] = [0 for _ in range(len(df.index))]

for e in edges:
    df = df.append(pd.Series({c:0 for c in df.columns},name=e))

Which gives us:

>>> df
         skill_1  skill_2  job_1  job_2  job_3
job_1          1        0      0      0      0
job_2          0        0      0      0      0
job_3          1        1      0      0      0
skill_1        0        0      0      0      0
skill_2        0        0      0      0      0

And then we can read in to networkx using nx.from_pandas_adjacency (assuming you want a directed graph)

G = nx.from_pandas_adjacency(df, create_using=nx.DiGraph)

Alternatively, we can use df.stack()

df = pd.DataFrame({'skill_1': {'job_1': 1, 'job_2': 0, 'job_3': 1},
 'skill_2': {'job_1': 0, 'job_2': 0, 'job_3': 1}})

G = nx.DiGraph()

for x,y in df.stack().reset_index().iterrows():
    G.add_node(y['level_0'])
    G.add_node(y['level_1'])
    if y[0]:
        G.add_edge(y['level_0'], y['level_1'])

Upvotes: 1

Importing non-square adjacency matrix into Networkx python

Answers (3)

One option is to generate the missing rows and columns

Alternatively, we can use df.stack()

Related Questions