Reputation: 416
I have some data in pandas dataframe form below, where the columns represent discrete skills and the rows represent discrete jobs. A 1 is present only if the skill is required by the job, otherwise 0.
skill_1, skill_2,
job_1 1, 0,
job_2 0, 0,
job_3 1, 1,
I want to create a graph to visualize this relationship between jobs and skills, using networkx. I've tried two methods, one on the dataframe, itself, nx.from_pandas_adjacency
and nx.from_numpy_matrix
. The latter method was applied to a numpy representation of the dataframe, where the column and row names were removed.
In either situation, an error was raised because this is a non_square matrix. This makes sense as networkx is likely interpreting both columns and rows as the same set of nodes. However, the columns and nodes represent distinctly different things here. Two jobs are connected by the skill(s) they share and two skills are connected by the job(s) they share, but there is no direct edge between any two skills or any two jobs.
How can I import my data into networkx given that my rows and columns are different sets of nodes?
Upvotes: 5
Views: 2435
Reputation: 340
As mentioned by ComplexGates, what you have here is a biadjacency matrix. I see that you've added a solution where you fill in the rest of the matrix with zeros to make it square. However, I suspect what you were really wanted was how to convert a biadjacency matrix into a (square) adjacency matrix, which is different from the posted solution.
For a biadjacency matrix A with m rows and n columns, you can convert it into an adjacency matrix of size (m+n)x(m+n) like so:
┏ ┓
┃0_nxn A^T ┃
┃A_mxn 0_mxm┃
┗ ┛
In other words, put A at the bottom left of the (m+n)x(m+n) matrix, and the transpose of A at the top right, and fill the remaining space with zeros.
In code, if A is a 2D Numpy array, you might do something like:
def bipartite_to_adjacency(A):
m, n = A.shape
Z_mm = np.zeros((m,m), dtype=int)
Z_nn = np.zeros((n,n), dtype=int)
top_partition = np.concatenate((Z_nn,np.transpose(A)), axis=1)
bottom_partition = np.concatenate((A,Z_mm), axis=1)
return np.concatenate((top_partition, bottom_partition))
Upvotes: 0
Reputation: 743
You have a bipartite graph. Networkx can create this network from your original (bi)adjacency matrix using nx.algorithms.bipartite.matrix.from_biadjacency_matrix
Upvotes: 2
Reputation: 14506
(I was curious about a vectorised method to achieve this, so I asked this question which has answers which provide such a method.)
df = pd.DataFrame({'skill_1': {'job_1': 1, 'job_2': 0, 'job_3': 1},
'skill_2': {'job_1': 0, 'job_2': 0, 'job_3': 1}})
edges = df.columns
for i in df.index:
df[i] = [0 for _ in range(len(df.index))]
for e in edges:
df = df.append(pd.Series({c:0 for c in df.columns},name=e))
Which gives us:
>>> df
skill_1 skill_2 job_1 job_2 job_3
job_1 1 0 0 0 0
job_2 0 0 0 0 0
job_3 1 1 0 0 0
skill_1 0 0 0 0 0
skill_2 0 0 0 0 0
And then we can read in to networkx using nx.from_pandas_adjacency
(assuming you want a directed graph)
G = nx.from_pandas_adjacency(df, create_using=nx.DiGraph)
df = pd.DataFrame({'skill_1': {'job_1': 1, 'job_2': 0, 'job_3': 1},
'skill_2': {'job_1': 0, 'job_2': 0, 'job_3': 1}})
G = nx.DiGraph()
for x,y in df.stack().reset_index().iterrows():
G.add_node(y['level_0'])
G.add_node(y['level_1'])
if y[0]:
G.add_edge(y['level_0'], y['level_1'])
Upvotes: 1