Reputation: 414
I'm trying to take a list of 4 million entries and rather than iterate over them all, reduce the list in the for loop that is enumerating them as it goes along.
The reduction criteria is found in the loop. Some later my_huge_list elements contain an combination of 2 consecutive elements that allows them to be discarded immediately.
Here I'm going to remove sublists with 1,2 and A,B in them from my_huge_list.
Please note I don't know in advance that 1,2 and A,B are illegal until I go into my for loop.
output_list = []
my_huge_list = [[0,1,2,3,4],[0,1,3,4],[0,1,2,3,4],[0,1,3,4],[0,1,2,4],[0,1,2,3,4],[A,B],[0,1,3,A,B],[0,1,2,3,4],[0,1,3,4],[0,1,2,3,4],[0,1,3,4],[0,1,2,4]...] #to 4m assorted entries
for sublist in my_huge_list[:]:
pair = None
for item_index in sublist[:-1]: #Edit for Barmar. each item in sublist is actually an object with attributes about allowed neighbors.
if sublist[item_index +1] in sublist[item_index].attributes['excludes_neighbors_list']:
pair = [sublist[item_index],sublist[item_index +1]] #TODO build a list of pairs
if pair != None: #Don't want pair in any item of output_list
my_huge_list = [x for x in my_huge_list if not ','.join(pair) in str(x)] #This list comprehension sole function to reduce my_huge_list from 4m item list to 1.7m items
#if '1, 2' in str(sublist): #Don't want 1,2 in any item of output_list
#my_huge_list = [x for x in my_huge_list if not '1, 2' in str(x)] #This list comprehension sole function to reduce my_huge_list
#elif 'A, B' in str(sublist): #Don't want A,B in any item of output_list
#my_huge_list = [x for x in my_huge_list if not 'A, B' in str(x)] #This list comprehension sole function to reduce my_huge_list from 1.7m item list to 1.1m items
else:
output_list.append(sublist)
my_huge_list
>>>[[0,1,3,4],[0,1,3,4],[0,1,3,4],[0,1,3,4]...]
So the 'for loop' unfortunately does not seem to get any faster because my_huge_list is still iterated over all 4m entries, even though it was quickly reduced by the list comprehension.
[The my_huge_list does not need to be processed in any order and does not need to be retained after this loop.]
[I have considered making the for loop into a sub-function and using map and also the shallow copy but can't figure this architecture out.]
[I'm sure by testing that the removal of list elements by list comprehension is quicker than brute-forcing all 4m sublists.]
Thanks!
Upvotes: 1
Views: 1703
Reputation: 13868
Here's my dig on it:
my_huge_list = [[0,1,2,3,4],[0,1,3,4],[0,1,2,3,4],[0,1,3,4],[0,1,2,4],[0,1,2,3,4],['A','B'],[0,1,3,'A','B'],[0,'A','B'],[0,1,2,3,4],[0,1,3,4],[0,1,2,3,4],[0,1,3,4],[0,1,2,4]] #to 4m assorted entries
# ... do whatever and return unwanted list... #
# ... if needed, convert the returned items into lists before putting into unwanted ... #
unwanted = [[1,2], ['A','B']]
index = 0
while index < len(my_huge_list):
sublist = my_huge_list[index]
next = True
for u in unwanted:
if u in [sublist[j:j+len(u)] for j in range(len(sublist)-len(u)+1)] or u == sublist:
my_huge_list.pop(index)
next = False
index += next
print(my_huge_list)
# [[0, 1, 3, 4], [0, 1, 3, 4], [0, 1, 3, 4], [0, 1, 3, 4]]
It's not elegant but it gets the job done. A huge caveat is that modifying a list
while iterating over it is bad karma (pros will probably shake their heads at me), but dealing with a size of 4 mil you can understand I'm trying to save some memory by modifying in place.
This is also scale-able so that if you have multiple numbers of unwanted
in different sizes, it should still catch it from your huge list. If your element size is 1, try to match the expected element type from your my_huge_list
. e.g. if your my_huge_list
has a [1], your unwanted should be [1] as well. If the element is a string
instead of list
, you'll need that string
in your unwanted
. int
/float
will however break this current code as you can't iterate over it, but you can add extra handling before you iterate through unwanted.
Upvotes: 1
Reputation: 780879
You're iterating over your huge list twice, once in the main for
loop, and then each time you find an invalid element you iterate over it again in the list comprehensions to remove all of those invalid elements.
It would be better to simply filter those elements out of the list once with a list comprehension.
def sublist_contains(l, pair):
for i in range(len(l)-1):
if l[i] == pair[0] and l[i+1] == pair[1]:
return True
return False
output_list = [sublist for sublist in my_huge_list if not(list_contains(sublist, ['A', 'B']) or list_contains(sublist, ['1', '2']))]
My sublist_contains()
function assumes it's always just two elements in a row you have to test for. You can replace this with a more general function if necessary. See elegant find sub-list in list
Upvotes: 0