Reputation: 366
I'm iterating through a list of over three million items and am assigning them integer values. For organization, I have made a dictionary whose keys are the integers and values are the list of items with that score. A priori, I do not know how many items will have a certain score, so I'm using the + operator to append onto the list as follows:
for e in xs:
myDict[val(e)] = myDict.get(val,[]) + [e]
My questions are:
Upvotes: 3
Views: 4610
Reputation: 180471
Yes, append and use a collections.defaultdict:
from collections import defaultdict
d = defaultdict(list)
for e in xs:
d[val(e)].append(e)
appending is an 0(1)
operation, your approach is linear 0(n+k)
as you create a new list each time.
If you were in a situation where you were adding multiple items, you would extend
your list.
One thing to note is that my_list += some_list
is equivalent to mylist.extend(some_list)
, but it is very different from my_list = my_list + some_list
. The former adds to the original list, and the latter is doing what your code is doing, concatenating two lists to create a completely new list.
The complexity for extend
or +=
is 0(k)
where k is the length of some_list
.
wiki.python has a list of the complexity of common operations in Python.
Upvotes: 2
Reputation: 85492
Use append
:
for e in xs:
myDict.setdefault(val(e), []).append(e)
This avoids building a new list every time. The operation list1
+ list2
needs to build a new list in each iteration and hence allocate memory. append
is more efficient because a list as pre-allocated memory at the end. For example, building a list with append starting from an empty list to a list with 10 million entries takes a bit more than 100 memory allocations.
The setdefault method of dictionaries returns the corresponding value if the key exist. If the key is not in th dictionary, it returns the default. In this case the default is a list. Due to the mutability of lists, we can append to the empty list in the first and to the partially filled list in each subsequent iteration.
An alternative to using setdefault()
is collections.defaultdict. Do some profiling to find out which one is faster.
Upvotes: 4