Reputation: 4618
I am using the following code to convert my list of lists to a co-occurrence matrix.
lst = [
['a', 'b'],
['b', 'c', 'd', 'e'],
['a', 'd'],
['b', 'e']
]
u = (pd.get_dummies(pd.DataFrame(lst), prefix='', prefix_sep='')
.groupby(level=0, axis=1)
.sum())
v = u.T.dot(u)
v.values[(np.r_[:len(v)], ) * 2] = 0
print(v)
My output is as follows:
a b c d e
a 0 1 0 1 0
b 1 0 1 1 2
c 0 1 0 1 1
d 1 1 1 0 1
e 0 2 1 1 0
I would like to transform my co-occurrence matrix to a weighted undirected
networkx graph where weights
represent the co-occurrence count in the matrix.
Currently, I have tried to do it as follows. However, I am not sure how I can insert weights to the graph.
print("get x and y pairs")
#get (x,y) pairs from the cooccurrence matrix
arr = np.where(v>=1)
corrs = [(v.index[x], v.columns[y]) for x, y in zip(*arr)]
#get the unique pairs
final_arr = []
for x, y in corrs:
if (y,x) not in final_arr:
final_arr.append((x,y))
#construct the graph
G = nx.Graph()
nodes_vocabulary_list = ['a', 'b', 'c', 'd', 'e']
G.add_nodes_from(nodes_vocabulary_list)
G.add_edges_from(final_arr)
I am wondering if there is a more easy way to do it networkx
?
I am happy to provide more details if needed.
Upvotes: 4
Views: 2984
Reputation: 863731
I believe you can use:
lst = [
['a', 'b'],
['b', 'c', 'd', 'e'],
['a', 'd'],
['b', 'e']
]
u = pd.get_dummies(pd.DataFrame(lst), prefix='', prefix_sep='').sum(level=0, axis=1)
v = u.T.dot(u)
#set 0 to lower triangular matrix
v.values[np.tril(np.ones(v.shape)).astype(np.bool)] = 0
print(v)
a b c d e
a 0 1 0 1 0
b 0 0 1 1 2
c 0 0 0 1 1
d 0 0 0 0 1
e 0 0 0 0 0
#reshape and filter only count > 0
a = v.stack()
a = a[a >= 1].rename_axis(('source', 'target')).reset_index(name='weight')
print(a)
source target weight
0 a b 1
1 a d 1
2 b c 1
3 b d 1
4 b e 2
5 c d 1
6 c e 1
7 d e 1
Create graph by from_pandas_edgelist
import networkx as nx
G = nx.from_pandas_edgelist(a, edge_attr=True)
print (nx.to_dict_of_dicts(G))
{'a': {'b': {'weight': 1}, 'd': {'weight': 1}},
'b': {'a': {'weight': 1}, 'c': {'weight': 1}, 'd': {'weight': 1}, 'e': {'weight': 2}},
'd': {'a': {'weight': 1}, 'b': {'weight': 1}, 'c': {'weight': 1}, 'e': {'weight': 1}},
'c': {'b': {'weight': 1}, 'd': {'weight': 1}, 'e': {'weight': 1}},
'e': {'b': {'weight': 2}, 'c': {'weight': 1}, 'd': {'weight': 1}}}
Upvotes: 6