Reputation: 6643
I am trying to find similar words in a group of strings. I am using SequenceMatcher
from difflib
.
And once a similar word found, to avoid duplication I am trying to remove it with .remove(word)
but getting error as ValueError: list.remove(x): x not in list
.
May I know why I am unable to remove that element from list?
tags = ['python', 'tips', 'tricks', 'resources', 'flask', 'cron', 'tools', 'scrabble', 'code challenges', 'github', 'fork', 'learning', 'game', 'itertools', 'random', 'sets', 'twitter', 'news', 'python', 'podcasts', 'data science', 'challenges', 'APIs', 'conda', '3.6', 'code challenges', 'code review', 'HN', 'github', 'learning', 'max', 'generators', 'scrabble', 'refactoring', 'iterators', 'itertools', 'tricks', 'generator', 'games']
similar_tags = []
for word1 in tag:
for word2 in tag:
if word1[0] == word2[0]:
if 0.87 < SequenceMatcher(None, word1, word2).ratio() < 1 :
similar_tags.append((word1,word2))
tag.remove(word1)
print(similar_tags) # add for debugging
But I am getting an error as
Traceback (most recent call last):
File "tags.py", line 71, in <module>
similar_tags = dict(get_similarities(tags))
File "tags.py", line 52, in get_similarities
tag.remove(word1)
ValueError: list.remove(x): x not in list
Upvotes: 0
Views: 455
Reputation: 5
your modify a list that you are iterating which is a bad thing to do
push the words to a new list then remove the items form tags list that exist in the new list try something like this
similar_tags = []
to_be_removed = []
for word1 in tag:
for word2 in tag:
if word1[0] == word2[0]:
if 0.87 < SequenceMatcher(None, word1, word2).ratio() < 1 :
similar_tags.append((word1,word2))
to_be_removed.append(word1)
for word in to_be_removed:
if word in tag:
tag.remove(word)
print(similar_tags) # add for debugging
Upvotes: 0
Reputation: 18838
If you have two words word21
and word22
which matches with the word1
under the specified constraints, as you remove from the list for the word21
, there is no word1
in the list to be removed for word22
.
Hence, you can correct it by the following modification:
for word1 in tag:
is_found = False #add this flag
for word2 in tag:
if word1[0] == word2[0]:
if 0.87 < SequenceMatcher(None, word1, word2).ratio() < 1 :
is_found = True #true here as you want to remove it after the termination of the current loop
similar_tags.append((word1,word2))
if is_found: #if founded this word under the specified constraint at least one time, the remove it from the list
tag.remove(word1)
Upvotes: 1