user217285
user217285

Reputation: 366

Efficiently adding elements to a list in Python

I'm iterating through a list of over three million items and am assigning them integer values. For organization, I have made a dictionary whose keys are the integers and values are the list of items with that score. A priori, I do not know how many items will have a certain score, so I'm using the + operator to append onto the list as follows:

for e in xs:
   myDict[val(e)] = myDict.get(val,[]) + [e]

My questions are:

  1. Is there a cleaner way of doing this?
  2. What's the time complexity of the + operation? Is it creating an entirely new list, copying the elements from the original list, and adding them in?
  3. What if I was adding an element to a set?

Upvotes: 3

Views: 4610

Answers (2)

Padraic Cunningham
Padraic Cunningham

Reputation: 180471

Yes, append and use a collections.defaultdict:

from collections import defaultdict

d = defaultdict(list)
for e in xs:
   d[val(e)].append(e)

appending is an 0(1) operation, your approach is linear 0(n+k) as you create a new list each time.

If you were in a situation where you were adding multiple items, you would extend your list.

One thing to note is that my_list += some_list is equivalent to mylist.extend(some_list), but it is very different from my_list = my_list + some_list. The former adds to the original list, and the latter is doing what your code is doing, concatenating two lists to create a completely new list.

The complexity for extend or += is 0(k) where k is the length of some_list.

wiki.python has a list of the complexity of common operations in Python.

Upvotes: 2

Mike Müller
Mike Müller

Reputation: 85492

Use append:

for e in xs:
   myDict.setdefault(val(e), []).append(e)

This avoids building a new list every time. The operation list1 + list2 needs to build a new list in each iteration and hence allocate memory. append is more efficient because a list as pre-allocated memory at the end. For example, building a list with append starting from an empty list to a list with 10 million entries takes a bit more than 100 memory allocations.

The setdefault method of dictionaries returns the corresponding value if the key exist. If the key is not in th dictionary, it returns the default. In this case the default is a list. Due to the mutability of lists, we can append to the empty list in the first and to the partially filled list in each subsequent iteration.

An alternative to using setdefault() is collections.defaultdict. Do some profiling to find out which one is faster.

Upvotes: 4

Related Questions