Reputation: 83
I have a list that is around 10M in length. Each index contains a dictionary
so for instance...
l = [{'id': 'y'}, {'id', 'x'}, {'id', 'z'} ... ]
I have another list with items that I want to remove
m = ['y', 'z']
I tried
l = [i for i in l if i['id'] not in m]
But as expected a run time of O(n^3)
is not amazing.
My second approach is to create a new dictionary to store the index of the items I wish to remove:
temp = {'y': 0, 'z': 2, ... }
for i in range(0, len(temp)):
del l[temp[m[-1]]]
del m[-1]
This did improve the execution time by A LOT (from somewhere around an hour to a few seconds or minutes for datasets of 1M). But since I'm storing the indexes somewhere, the memory used is rather high
My question is: is there a more efficient way of removing items from a large list in O(n)
time and yet not use so much memory?
Upvotes: 0
Views: 771
Reputation: 29700
I doubt you'll get much better than:
s = set(m)
l = [i for i in l if i['id'] not in s]
This would likely be much quicker then creating a loop and deleting items one at a time. There is usually a tradeoff though between memory and speed - this should be reasonably fast, but will use up to twice the memory of your list as it creates a new one.
Caveat: When I say "I doubt you'll get much better than..." I'm talking about standard Python. A numerical library like Pandas or Numpy could likely do much better in terms of both memory and time.
Upvotes: 2
Reputation: 1727
if you can store only the indices to be deleted in the list temp
temp = [0,2,...]
then this could work faster
np.delete(l,temp)
Upvotes: 1