How to circumvent slow search with Index() method for a large list

Question

I have a large list myList containing tuples.

I need to remove the duplicates in this list (that is the tuples with same elements in the same order). I also need to keep track of this list's indices in a separate list, indexList. If I remove a duplicate, I need to change its index in indexList to first identical value's index.

To demonstrate what I mean, if myList looks like this:

myList = [(6, 2), (4, 3), (6, 2), (8, 1), (5, 4), (4, 3), (2, 1)]

Then I need to construct indexList like this:

indexList = (0, 1, 0, 2, 3, 1, 4)

Here the third value is identical to first, so it (third value) gets index 0. Also the subsequent value gets an updated index of 2 and so on.

Here is how I achieved this:

unique = set()
i = 0
for v in myList[:]:
    if v not in unique:
        unique.add(v)
        indexList.append(i)
        i = i+1
    else:
        myList.pop(i)
        indexList.append(myList.index(v))

This does what I need. However index() method makes the script very slow when myList contains hundreds of thousands of elements. As I understand this is because it's an O(n) operation.

So what changes could I make to achieve the same result but make it faster?

hyperneutrino · Accepted Answer

If you make a dict to store the first index of each value, you can do the lookup in O(1) instead of O(n). So in this case, before the for loop, do indexes = {}, and then in the if block, do indexes[v] = i and in the else block use indexes[v] instead of myList.index(v).

How to circumvent slow search with Index() method for a large list

Answers (1)

Related Questions