Reputation: 200
I am working with Python and networkx and this is my first project with this tool. I would like to make a graph to analyze the similarity between some strings. FYI, I have used the cosin-similarity to calculate the similarity between the strings.
Below is the code I've used so far:
skills=[]
for i in data['skills']:
skills.append(i)
def clean_string(text):
text = ''.join([word for word in text if word not in string.punctuation])
text = text.lower()
text = ' '.join([word for word in text.split() if word not in stop_words])
return text
cleaned = list(map(clean_string, skills))
# print(cleaned)
vectorizer = CountVectorizer().fit_transform(cleaned)
vectors = vectorizer.toarray()
# print(vectors)
csim = cosine_similarity(vectors)
I want the cosine similarity to be the weight of the edges in my network.
G = nx.from_numpy_matrix(np.matrix(csim), create_using=nx.DiGraph)
Then I try to filter the edges whose weight is above the threshold of 0.2.
def slice_network(G, T, data = True):
""" Remove all edges with weight<T from G or its copy. """
F = G.copy() if copy else G
F.remove_edges_from((n1, n2) for n1, n2, w in F.edges(data="weight") if w < T)
return G
F = slice_network(G, 0.2)
print(F.edges())
However, it throws me the error:
RuntimeError: dictionary changed size during iteration
Could someone help me?
Upvotes: 2
Views: 1826
Reputation: 4892
You can simply need to add []
to your remove_edges_from
call (and you should return F
instead of G
. From your other question I created a minimal reproducible example:
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
simple_weights = [[1., 0.51639778, 0., 0., 0., 0.],
[0.51639778, 1., 0., 0., 0., 0.25819889],
[0., 0., 1., 0., 0., 0.33333333],
[0., 0., 0., 1., 0.65465367, 0.],
[0., 0., 0., 0.65465367, 1., 0.],
[0., 0.25819889, 0.33333333, 0., 0., 1.]]
G = nx.from_numpy_matrix(np.array(simple_weights), create_using=nx.DiGraph)
nx.draw(G)
plt.show()
F = G.copy()
threshold = 0.4
F.remove_edges_from([(n1, n2) for n1, n2, w in F.edges(data="weight") if w < threshold])
nx.draw(F)
plt.show()
or as your function (you haven't define copy
in your code above)
def slice_network(G, T, data = True):
""" Remove all edges with weight<T from G or its copy. """
F = G.copy() if copy else G
F.remove_edges_from([(n1, n2) for n1, n2, w in F.edges(data="weight") if w < T])
return F
or as filter before creation
threshold = 0.4
simple_weights = np.array(simple_weights)
simple_weights[simple_weights<threshold] = 0
Upvotes: 2