Reputation: 257
I wrote on machine learning algorithm that works perfectly now I have to iterate all the items of list against one another to generate a similarity token between 0.01 to 1.00. Here's code
temp[]
start_node = 0
end_node = 0
length = len(temp)
for start_node in range(length):
doc1 = nlp(temp[start_node])
for end_node in range(++start_node, length):
doc2 = nlp(temp[end_node])
similar = doc1.similarity(doc2)
exp_value = float(0.85)
if similar == 1.0:
print("Exact match", similar, temp[end_node], "---------||---------", temp[start_node])
elif 0.96 < similar < 0.99:
print("possible match", similar, temp[end_node], "---------||---------", temp[start_node])
temp.remove(temp[end_node])
Here, I am trying to check one item with all others in the list if any items are similar then delete that item from the list as there is no benefit to check the similarity of sentences back again with other elements, that will be a waste of computing power. But when I am trying to pop out elements I am getting Out of index error.
<ipython-input-12-c1959947bdd1> in <module>
4 length = len(temp)
5 for start_node in range(length):
----> 6 doc1 = nlp(temp[start_node])
7 for end_node in range(++start_node, length):
8 doc2 = nlp(temp[end_node])
I am just trying to keep original sentences, delete all the sentences which are similar in list so it doesn't check back with those items.
Temp list have 351 items, here i am just referencing as a list.
here;s a test of it
print(temp[:1])
['malicious: caliche development partners "financial statement"has been shared with you']
I tried creating another duplicated list and delete similar items from that list
final_items = temp
start_node = 0
end_node = 0
length = len(temp)
for start_node in range(length):
doc1 = nlp(temp[start_node])
for end_node in range(++start_node, length):
doc2 = nlp(temp[end_node])
similar = doc1.similarity(doc2)
exp_value = float(0.85)
if similar == 1.0:
print("Exact match", similar, temp[end_node], "---------||---------", temp[start_node])
elif 0.96 < similar < 0.99:
print("possible match", similar, temp[end_node], "---------||---------", temp[start_node])
final_items.remove(temp[end_node])
But still got the same list index out of range while I am deleting elements from another list which I am not iterating even.
Upvotes: 0
Views: 130
Reputation: 819
The problem is with your code is you are trying to delete/remove items inside the clone of the original array itself while iterating through the original array. When you directly assign array to another varible it just create a link/reference to the original array.
Lets get your current code.
final_items = temp
start_node = 0
end_node = 0
length = len(temp)
for start_node in range(length):
doc1 = nlp(temp[start_node])
for end_node in range(++start_node, length):
doc2 = nlp(temp[end_node])
similar = doc1.similarity(doc2)
exp_value = float(0.85)
if similar == 1.0:
print("Exact match", similar, temp[end_node], "---------||---------", temp[start_node])
elif 0.96 < similar < 0.99:
print("possible match", similar, temp[end_node], "---------||---------", temp[start_node])
final_items.remove(temp[end_node])
And lets get temp
is the following array.
temp = [node1,node2,node3,........,nodeN]
And
final_items = temp
where the array items belongs to the class Node
In here
elif 0.96 < similar < 0.99:
print("possible match", similar, temp[end_node], "---------||---------", temp[start_node])
final_items.remove(temp[end_node])
Since final_items
is same as temp
, when you remove an element from final_items
, that element is also removed from the temp
also. Just look at this simple example.
>>> a=[1,2,3]
>>> b=a
>>> b
[1, 2, 3]
>>> b.remove(1)
>>> a
[2, 3]
>>> b
[2, 3]
So in your case , Imagine there were 100 nodes in temp
array. then in your for loop it will check indexes until 99. But while running the temp array has been shorten. So it won't have 99th index. Which raises index error.
The easiest wy to solve this is create a hard copy of the array/list. There are several ways to create a hardcopy of a list without just linking it to original array.
final_items = [n for n in temp]
or
from copy import deepcopy as dc
final_items = dc(temp)
Upvotes: 0
Reputation: 449
I think your problem lays here.
temp.remove(temp[end_node])
You will remove items in the temp
list and therefor the list indexing will run out of range.
Let's say, to start with temp
contain 351 items, I.e index 0 to 350.
Now, the script will remove 1 (or more) item in the temp
list.
Suddenly the temp
list will have 350 items, I.e. index 0 to 349.
However, the script still iterate using the temp original length of 351.
So when the script comes to last iteration index 350 (or earlier if several items are removed) the interation will try get a list index that do not exist any more.
doc1 = nlp(temp[350])
Since at this time the temp
list index are 0 to 349.
Maybe better having an additional copy of the list for modification rather than modify the list you iterate over.
If you create additional list, remember to use copy method.
final_items = temp.copy()
Since regular assignment will keep reference to the temp
list.
Python doc - copy()
Upvotes: 1