Reputation: 452
I have a large list myList
containing tuples.
I need to remove the duplicates in this list (that is the tuples with same elements in the same order). I also need to keep track of this list's indices in a separate list, indexList
. If I remove a duplicate, I need to change its index in indexList
to first identical value's index.
To demonstrate what I mean, if myList
looks like this:
myList = [(6, 2), (4, 3), (6, 2), (8, 1), (5, 4), (4, 3), (2, 1)]
Then I need to construct indexList
like this:
indexList = (0, 1, 0, 2, 3, 1, 4)
Here the third value is identical to first, so it (third value) gets index 0
. Also the subsequent value gets an updated index of 2
and so on.
Here is how I achieved this:
unique = set()
i = 0
for v in myList[:]:
if v not in unique:
unique.add(v)
indexList.append(i)
i = i+1
else:
myList.pop(i)
indexList.append(myList.index(v))
This does what I need. However index()
method makes the script very slow when myList
contains hundreds of thousands of elements. As I understand this is because it's an O(n) operation
.
So what changes could I make to achieve the same result but make it faster?
Upvotes: 1
Views: 52
Reputation: 5425
If you make a dict to store the first index of each value, you can do the lookup in O(1)
instead of O(n)
. So in this case, before the for loop, do indexes = {}
, and then in the if
block, do indexes[v] = i
and in the else
block use indexes[v]
instead of myList.index(v)
.
Upvotes: 1