MonoShiro
MonoShiro

Reputation: 83

Faster way of removing items from list

I have a list that is around 10M in length. Each index contains a dictionary

so for instance...

l = [{'id': 'y'}, {'id', 'x'}, {'id', 'z'} ... ]

I have another list with items that I want to remove

m = ['y', 'z']

I tried

l = [i for i in l if i['id'] not in m]

But as expected a run time of O(n^3) is not amazing. My second approach is to create a new dictionary to store the index of the items I wish to remove:

temp = {'y': 0, 'z': 2, ... }

for i in range(0, len(temp)):
    del l[temp[m[-1]]]
    del m[-1]

This did improve the execution time by A LOT (from somewhere around an hour to a few seconds or minutes for datasets of 1M). But since I'm storing the indexes somewhere, the memory used is rather high

My question is: is there a more efficient way of removing items from a large list in O(n) time and yet not use so much memory?

Upvotes: 0

Views: 771

Answers (2)

Gerrat
Gerrat

Reputation: 29700

I doubt you'll get much better than:

s = set(m)
l = [i for i in l if i['id'] not in s]

This would likely be much quicker then creating a loop and deleting items one at a time. There is usually a tradeoff though between memory and speed - this should be reasonably fast, but will use up to twice the memory of your list as it creates a new one.

Caveat: When I say "I doubt you'll get much better than..." I'm talking about standard Python. A numerical library like Pandas or Numpy could likely do much better in terms of both memory and time.

Upvotes: 2

user3404344
user3404344

Reputation: 1727

if you can store only the indices to be deleted in the list temp

temp = [0,2,...]

then this could work faster

np.delete(l,temp)

Upvotes: 1

Related Questions