DaniB
DaniB

Reputation: 200

RuntimeError filtering the edges with weight below the threshold - Networkx

I am working with Python and networkx and this is my first project with this tool. I would like to make a graph to analyze the similarity between some strings. FYI, I have used the cosin-similarity to calculate the similarity between the strings.

Below is the code I've used so far:

skills=[]

for i in data['skills']:
    skills.append(i)


def clean_string(text):
    text = ''.join([word for word in text if word not in string.punctuation])
    text = text.lower()
    text = ' '.join([word for word in text.split() if word not in stop_words])
    return text

cleaned = list(map(clean_string, skills))
# print(cleaned)

vectorizer = CountVectorizer().fit_transform(cleaned)
vectors = vectorizer.toarray()
# print(vectors)

csim = cosine_similarity(vectors)

I want the cosine similarity to be the weight of the edges in my network.

G = nx.from_numpy_matrix(np.matrix(csim), create_using=nx.DiGraph)

Then I try to filter the edges whose weight is above the threshold of 0.2.

def slice_network(G, T, data = True):
    """ Remove all edges with weight<T from G or its copy. """
    F = G.copy() if copy else G
    F.remove_edges_from((n1, n2) for n1, n2, w in F.edges(data="weight") if w < T)
    return G

F = slice_network(G, 0.2)
print(F.edges())

However, it throws me the error:

RuntimeError: dictionary changed size during iteration

Could someone help me?

Upvotes: 2

Views: 1826

Answers (1)

Sparky05
Sparky05

Reputation: 4892

You can simply need to add [] to your remove_edges_from call (and you should return F instead of G. From your other question I created a minimal reproducible example:

import networkx as nx
import numpy as np
import matplotlib.pyplot as plt


simple_weights = [[1., 0.51639778, 0., 0., 0., 0.],
                  [0.51639778, 1., 0., 0., 0., 0.25819889],
                  [0., 0., 1., 0., 0., 0.33333333],
                  [0., 0., 0., 1., 0.65465367, 0.],
                  [0., 0., 0., 0.65465367, 1., 0.],
                  [0., 0.25819889, 0.33333333, 0., 0., 1.]]


G = nx.from_numpy_matrix(np.array(simple_weights), create_using=nx.DiGraph)
nx.draw(G)
plt.show()

F = G.copy()
threshold = 0.4
F.remove_edges_from([(n1, n2) for n1, n2, w in F.edges(data="weight") if w < threshold])
nx.draw(F)
plt.show()

or as your function (you haven't define copy in your code above)

def slice_network(G, T, data = True):
    """ Remove all edges with weight<T from G or its copy. """
    F = G.copy() if copy else G
    F.remove_edges_from([(n1, n2) for n1, n2, w in F.edges(data="weight") if w < T])
    return F

or as filter before creation

threshold = 0.4
simple_weights = np.array(simple_weights)
simple_weights[simple_weights<threshold] = 0

Upvotes: 2

Related Questions