EmJ
EmJ

Reputation: 4618

How to convert a co-occurrence matrix to networkx graph

I am using the following code to convert my list of lists to a co-occurrence matrix.

lst = [
    ['a', 'b'],
    ['b', 'c', 'd', 'e'],
    ['a', 'd'],
    ['b', 'e']
]

u = (pd.get_dummies(pd.DataFrame(lst), prefix='', prefix_sep='')
       .groupby(level=0, axis=1)
       .sum())

v = u.T.dot(u)
v.values[(np.r_[:len(v)], ) * 2] = 0

print(v)

My output is as follows:

   a  b  c  d  e
a  0  1  0  1  0
b  1  0  1  1  2
c  0  1  0  1  1
d  1  1  1  0  1
e  0  2  1  1  0

I would like to transform my co-occurrence matrix to a weighted undirected networkx graph where weights represent the co-occurrence count in the matrix.

Currently, I have tried to do it as follows. However, I am not sure how I can insert weights to the graph.

print("get x and y pairs")
#get (x,y) pairs from the cooccurrence matrix
arr = np.where(v>=1)
corrs = [(v.index[x], v.columns[y]) for x, y in zip(*arr)]

#get the unique pairs
final_arr = []

for x, y in corrs:
    if (y,x) not in final_arr:
        final_arr.append((x,y))

#construct the graph
G = nx.Graph()
nodes_vocabulary_list = ['a', 'b', 'c', 'd', 'e']
G.add_nodes_from(nodes_vocabulary_list)
G.add_edges_from(final_arr)

I am wondering if there is a more easy way to do it networkx?

I am happy to provide more details if needed.

Upvotes: 4

Views: 2984

Answers (1)

jezrael
jezrael

Reputation: 863731

I believe you can use:

lst = [
    ['a', 'b'],
    ['b', 'c', 'd', 'e'],
    ['a', 'd'],
    ['b', 'e']
]

u = pd.get_dummies(pd.DataFrame(lst), prefix='', prefix_sep='').sum(level=0, axis=1)

v = u.T.dot(u)
#set 0 to lower triangular matrix
v.values[np.tril(np.ones(v.shape)).astype(np.bool)] = 0
print(v)
   a  b  c  d  e
a  0  1  0  1  0
b  0  0  1  1  2
c  0  0  0  1  1
d  0  0  0  0  1
e  0  0  0  0  0

#reshape and filter only count > 0
a = v.stack()
a = a[a >= 1].rename_axis(('source', 'target')).reset_index(name='weight')
print(a)
  source target  weight
0      a      b       1
1      a      d       1
2      b      c       1
3      b      d       1
4      b      e       2
5      c      d       1
6      c      e       1
7      d      e       1

Create graph by from_pandas_edgelist

import networkx as nx
G = nx.from_pandas_edgelist(a,  edge_attr=True)

print (nx.to_dict_of_dicts(G))
{'a': {'b': {'weight': 1}, 'd': {'weight': 1}}, 
 'b': {'a': {'weight': 1}, 'c': {'weight': 1}, 'd': {'weight': 1}, 'e': {'weight': 2}}, 
 'd': {'a': {'weight': 1}, 'b': {'weight': 1}, 'c': {'weight': 1}, 'e': {'weight': 1}}, 
 'c': {'b': {'weight': 1}, 'd': {'weight': 1}, 'e': {'weight': 1}}, 
 'e': {'b': {'weight': 2}, 'c': {'weight': 1}, 'd': {'weight': 1}}}

Upvotes: 6

Related Questions