Reputation: 53
Given old_list = list(range(1000))
, there are two ways for creating a new, flat list from multiple sublists that I'm familiar with:
new_list = []
for i in range(10,20):
new_list.extend(old_list[:i])
new_list = [old_list[:i] for i in range(10,20)]
new_list = [item for sublist in new_list for item in sublist]
I'm confused on which of these is more efficient for longer lists, and if there's any way one would use less memory than the other. It seems like the latter is more Pythonic, but I don't like having to go back and flatten, and I haven't found much discussion on the overhead of extend
(whereas there's a lot about append
).
Upvotes: 0
Views: 203
Reputation: 53
@juanpa.arrivillaga was exactly correct in their diagnosis of the problem; Python, being a smart language, does not bloat the memory for this type of operation because it simply fills the new list with the same objects in memory. I checked this using the memory-profiler
package available from PyPI with the elements of the lists being more-relevant ndarray
objects:
Line # Mem usage Increment Occurences Line Contents
============================================================
26 27.8 MiB 27.8 MiB 1 @profile
27 def double_listcomp_test():
28 1904.3 MiB 1876.6 MiB 10003 old_list = [a for a in
29 1904.3 MiB 0.0 MiB 1 np.random.randint(0,255,(10000,256,256,3),dtype=np.uint8)]
30 1904.4 MiB 0.1 MiB 5008 old_list = [a for _ in range(5) for a in random.sample(old_list,k=1000)]
31
32 1904.4 MiB 0.0 MiB 1 print(len(old_list))
33 1904.4 MiB 0.0 MiB 1 return old_list
This approach is working perfectly for me, and I think is at least as readable as the other two approaches for a Python programmer used to working with generator expressions. We are essentially just thinking about the collection of sublists as an iterator rather than its own list. My "accepted" approach for the above example is then:
my_list = list(range(1000))
my_list = [item for i in range(10,20) for item in my_list[:i]]
What I actually figured out I needed to do was to make copies of the arrays into that list so that I could delete the bigger old list. This isn't really within the scope of the question, as it obviously leads to larger bloat at the time of creation, but I wanted to include it since this question is what led me here. In this case I think the commented suggestion is especially correct, because it avoids having to unPythonically del old_list
(see below).
Two listcomps to flatten (as in question):
Line # Mem usage Increment Occurences Line Contents
============================================================
16 27.7 MiB 27.7 MiB 1 @profile
17 def listcomp_test():
18 1904.3 MiB 1876.6 MiB 10003 old_list = [a for a in
19 1904.3 MiB 0.0 MiB 1 np.random.randint(0,255,(10000,256,256,3),dtype=np.uint8)]
20 1904.4 MiB 0.1 MiB 8 new_list = [random.sample(old_list,k=1000) for _ in range(5)]
21 2842.6 MiB 938.3 MiB 5008 new_list = [item.copy() for sublist in new_list for item in sublist]
22
23 2842.7 MiB 0.0 MiB 1 print(len(new_list))
24 2842.7 MiB 0.0 MiB 1 return new_list
Two listcomps with del old_list
:
Line # Mem usage Increment Occurences Line Contents
============================================================
16 27.7 MiB 27.7 MiB 1 @profile
17 def listcomp_test():
18 1904.3 MiB 1876.6 MiB 10003 old_list = [a for a in
19 1904.3 MiB 0.0 MiB 1 np.random.randint(0,255,(10000,256,256,3),dtype=np.uint8)]
20 1904.4 MiB 0.1 MiB 8 new_list = [random.sample(old_list,k=1000) for _ in range(5)]
21 2842.7 MiB 938.3 MiB 5008 new_list = [item.copy() for sublist in new_list for item in sublist]
22 967.1 MiB -1875.5 MiB 1 del old_list
23
24 967.2 MiB 0.0 MiB 1 print(len(new_list))
25 967.2 MiB 0.0 MiB 1 return new_list
Single listcomp (as in comment):
Line # Mem usage Increment Occurences Line Contents
============================================================
26 27.8 MiB 27.8 MiB 1 @profile
27 def double_listcomp_test():
28 1904.3 MiB 1876.6 MiB 10003 my_list = [a for a in
29 1904.3 MiB 0.0 MiB 1 np.random.randint(0,255,(10000,256,256,3),dtype=np.uint8)]
30 2842.6 MiB -937.2 MiB 5008 my_list = [a.copy() for _ in range(5) for a in random.sample(my_list,k=1000)]
31
32 967.1 MiB -1875.5 MiB 1 print(len(my_list))
33 967.1 MiB 0.0 MiB 1 return my_list
NOTE: Python is certainly not the best language to be using if you're this concerned about memory management, but in data science applications we often don't have a choice, so I think having this available as an answer is prudent. In every case Python will have to take up the extra memory while making the new list, but by using the suggested listcomp approach we avoid needlessly assigning things to new variables, and thus we keep this around for as little time as possible.
Upvotes: 1