Reputation: 257

Getting Index Out of Range while iterating through list

I wrote on machine learning algorithm that works perfectly now I have to iterate all the items of list against one another to generate a similarity token between 0.01 to 1.00. Here's code

    temp[]
    start_node = 0
    end_node = 0
    length = len(temp)
    for start_node in range(length):
        doc1 = nlp(temp[start_node])
        for end_node in range(++start_node, length):
            doc2 = nlp(temp[end_node])
            similar = doc1.similarity(doc2)
            exp_value = float(0.85)
            if similar == 1.0:
                print("Exact match", similar, temp[end_node], "---------||---------",  temp[start_node])
            elif 0.96 < similar < 0.99:
                print("possible match", similar, temp[end_node], "---------||---------", temp[start_node])
                temp.remove(temp[end_node])

Here, I am trying to check one item with all others in the list if any items are similar then delete that item from the list as there is no benefit to check the similarity of sentences back again with other elements, that will be a waste of computing power. But when I am trying to pop out elements I am getting Out of index error.

<ipython-input-12-c1959947bdd1> in <module>
      4 length = len(temp)
      5 for start_node in range(length):
----> 6     doc1 = nlp(temp[start_node])
      7     for end_node in range(++start_node, length):
      8         doc2 = nlp(temp[end_node])

I am just trying to keep original sentences, delete all the sentences which are similar in list so it doesn't check back with those items.

Temp list have 351 items, here i am just referencing as a list.

here;s a test of it

print(temp[:1])

['malicious: caliche development partners "financial statement"has been shared with you']

I tried creating another duplicated list and delete similar items from that list

final_items = temp
start_node = 0
end_node = 0
length = len(temp)
for start_node in range(length):
    doc1 = nlp(temp[start_node])
    for end_node in range(++start_node, length):
        doc2 = nlp(temp[end_node])
        similar = doc1.similarity(doc2)
        exp_value = float(0.85)
        if similar == 1.0:
            print("Exact match", similar, temp[end_node], "---------||---------",  temp[start_node])
        elif 0.96 < similar < 0.99:
            print("possible match", similar, temp[end_node], "---------||---------", temp[start_node])
            final_items.remove(temp[end_node])

But still got the same list index out of range while I am deleting elements from another list which I am not iterating even.

Upvotes: 0

Answers (2)

Kavindu Ravishka

Reputation: 819

The problem is with your code is you are trying to delete/remove items inside the clone of the original array itself while iterating through the original array. When you directly assign array to another varible it just create a link/reference to the original array.

Lets get your current code.

final_items = temp
start_node = 0
end_node = 0
length = len(temp)
for start_node in range(length):
    doc1 = nlp(temp[start_node])
    for end_node in range(++start_node, length):
        doc2 = nlp(temp[end_node])
        similar = doc1.similarity(doc2)
        exp_value = float(0.85)
        if similar == 1.0:
            print("Exact match", similar, temp[end_node], "---------||---------",  temp[start_node])
        elif 0.96 < similar < 0.99:
            print("possible match", similar, temp[end_node], "---------||---------", temp[start_node])
            final_items.remove(temp[end_node])

And lets get temp is the following array.

temp = [node1,node2,node3,........,nodeN]

And

final_items = temp

where the array items belongs to the class Node

In here

elif 0.96 < similar < 0.99:
    print("possible match", similar, temp[end_node], "---------||---------", temp[start_node])
    final_items.remove(temp[end_node])

Since final_items is same as temp , when you remove an element from final_items , that element is also removed from the temp also. Just look at this simple example.

>>> a=[1,2,3]
>>> b=a
>>> b
[1, 2, 3]
>>> b.remove(1)
>>> a
[2, 3]
>>> b
[2, 3]

So in your case , Imagine there were 100 nodes in temp array. then in your for loop it will check indexes until 99. But while running the temp array has been shorten. So it won't have 99th index. Which raises index error.

The easiest wy to solve this is create a hard copy of the array/list. There are several ways to create a hardcopy of a list without just linking it to original array.

final_items = [n for n in temp]

from copy import deepcopy as dc
final_items = dc(temp)

Upvotes: 0

akane

Reputation: 449

I think your problem lays here.

temp.remove(temp[end_node])

You will remove items in the temp list and therefor the list indexing will run out of range.

Let's say, to start with temp contain 351 items, I.e index 0 to 350.

Now, the script will remove 1 (or more) item in the temp list.
Suddenly the temp list will have 350 items, I.e. index 0 to 349.

However, the script still iterate using the temp original length of 351.
So when the script comes to last iteration index 350 (or earlier if several items are removed) the interation will try get a list index that do not exist any more.

doc1 = nlp(temp[350])

Since at this time the temp list index are 0 to 349.

Maybe better having an additional copy of the list for modification rather than modify the list you iterate over.
If you create additional list, remember to use copy method.

final_items = temp.copy()

Since regular assignment will keep reference to the temp list.
Python doc - copy()

Upvotes: 1

Getting Index Out of Range while iterating through list

Answers (2)

Related Questions