Reputation: 1507
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
2 GB RAM, 4 GB Swap
----
import re
fieldname_groups = ("(ip,clienthost,ip_addr)", "(username,user,id)")
srcgroup = [x.strip() for x in re.sub('[\(\)]+', '', fieldname_groups[0]).split(',')]
dstgroup = [x.strip() for x in re.sub('[\(\)]+', '', fieldname_groups[1]).split(',')]
if not srcgroup or not dstgroup: raise Exception("No srcgroup or dstgroup specified!")
# srcgroup is now ['ip', 'clienthost', 'ip_addr']
# dstgroup is now ['username', 'user', 'id']
# Now append the string '.keyword' to a copy of every string in each list
[srcgroup.append('%s.keyword' % x) for x in srcgroup]
[dstgroup.append('%s.keyword' % x) for x in dstgroup]
# srcgroup should now be ['ip', 'clienthost', 'ip_addr', 'ip.keyword', 'clienthost.keyword', 'ip_addr.keyword']
# dstgroup should be similar
Every time I run this code, once I hit the list comprehensions the memory balloons up and the process is killed.
I'm not understanding what the problem is here; I feel like this is something I do all the time yet this time it's not working, so it's probably an amateur mistake but I'd appreciate any help in sorting it out. I've even tried rewriting the code to use a standard for loop but it still explodes.
Thanks!
Upvotes: 0
Views: 220
Reputation: 7980
As was stated above, your problem is that you're appending to a collection you're iterating on. That being the case, copying an entire list is somewhat wasteful. To avoid, creating another copy of the list, you could do:
srcgroup.extend( ['%s.keyword' % x for x in srcgroup] )
dstgroup.extend( ['%s.keyword' % x for x in dstgroup] )
which would modify the list in-place. Here's a performance comparison:
timeit.timeit('arr = range(1000); arr = arr.extend( [x + 1 for x in arr] )')
53.47952483119644
timeit.timeit('arr = range(1000); arr = [arr.append(x + 1) for x in arr[:]]')
118.02281077109734
As you can see, my suggestion takes half the time (although they're both still O(n)).
Upvotes: 1
Reputation: 195553
You are adding to a list, while you are iterating it -> the memory will grow up indefinitely:
[srcgroup.append('%s.keyword' % x) for x in srcgroup]
[dstgroup.append('%s.keyword' % x) for x in dstgroup]
The solution is to iterate over a copy:
[srcgroup.append('%s.keyword' % x) for x in srcgroup[:]]
[dstgroup.append('%s.keyword' % x) for x in dstgroup[:]]
Edit:
Better idea would be simplify things a little bit (from answers below):
srcgroup = ['ip', 'keyword']
srcgroup.extend([f'{x}.keyword' for x in srcgroup])
print(srcgroup)
Will output:
['ip', 'keyword', 'ip.keyword', 'keyword.keyword']
Upvotes: 2
Reputation: 18851
In a list comprehension, you should have side-effects (such as appending items to a list). The list comprehension creates a new list, you don't need to append
items manually.
If you want to replace each element in the list, you can create a new list and replace the old one:
scrgroup = [('%s.keyword' % x) for x in srcgroup]
dstgroup = [('%s.keyword' % x) for x in dstgroup]
However, if you want to add to the list a copy of each element with that formatting, you need to extend it:
scrgroup.extend([('%s.keyword' % x) for x in srcgroup])
dstgroup.extend([('%s.keyword' % x) for x in dstgroup])
However, to reduce memory usage, you could use a generator in that case:
scrgroup.extend((('%s.keyword' % x) for x in srcgroup))
dstgroup.extend((('%s.keyword' % x) for x in dstgroup))
Upvotes: 2
Reputation: 532023
The correct way to extend a list is to use the extend
method.
srcgroup = [x.strip() for x in re.sub('[\(\)]+', '', fieldname_groups[0]).split(',')]
srcgroup.extend(['%s.keyword' % x) for x in srcgroup])
Note that it is critical that the argument to extend
is a new list, and not a generator expression that iterates srcgroup
as new elements are added to srcgroup
.
Under no circumstances should you use a list comprehension solely for the side effect of the expression. Use
for x in ...:
...
instead of
[... for x in ...]
Upvotes: 3
Reputation: 8338
As @Andrej Kesely said, you're adding elements to the list you're iterating while iterating. So, you keep adding elements to the loop, and thus the loop never ends.
I'm not sure what you want exactly, but maybe what you should do is create a new list on the fly to iterate over it, instead of the one you're adding the elements to, as such:
[srcgroup.append('%s.keyword' % x) for x in srcgroup[:]] # Notice the [:]
[dstgroup.append('%s.keyword' % x) for x in dstgroup[:]] # Notice the [:]
Adding the [:] at the end of the list name slices it keeping all the elements, and saves it to a new object on the fly, so the error doesn't happen.
Upvotes: 2