nojohnny101
nojohnny101

Reputation: 562

speed of methods for building multiple list (comprehensions or not?)

I am trying to find the most efficient way to build two separate lists with an if condition to evaluate a boolean value returned from a function.

The trunc_keys list contains have no more than 15 string elements. Test was run with 10 elements.

Common Function:

def is_trunc(key):
      # about 10 lines of string manipulation that ultimately returns boolean

Method 1:

trunc_key_list = [key for key in trunc_keys if is_trunc(key)]
bad_key_list = [key for key in trunc_keys if not is_trunc(key)]

Method 2:

trunc_key_list = []
bad_key_list = []

[trunc_key_list.append(key) if is_trunc(key) else bad_key_list.append(key) for key in trunc_keys]

I timed the results using start_time = time.time() and print("%s" % (time.time() - start_time))

Results (average of 20 runs)

Method 1: 0.000411 Method 2: 0.000280

I expected Method 1 to be faster. I thought list comprehensions were ideal for this type of situation by avoiding having to instantiate empty lists. I found this thread which seems to support this result: Python list() vs list comprehension building speed

I am fairly new to python and would like to understand this better. Do list comprehensions not become advantageous with such a small list size? Am I missing something else?

I appreciate any insights, thank you!

Upvotes: 0

Views: 39

Answers (1)

James Wasson
James Wasson

Reputation: 514

If you notice we can count the number of time each statement is executed:

Let L be the length of the list

  • Method 1 times ran
    • is_trunc(key) - ran L * 2
    • append(key) - ran L * 0
  • Method 2 times ran
    • is_trunc(key) - ran L * 1
    • append(key) - ran L * 1

from this post we can see that the average time for append is:

~115 μs or 0.000115 seconds

  • Method 1's (single loop) total time without append: 0.000411 / 2 = 0.0002055
  • Method 2's total time without append: 0.000280 - 0.000115 = 0.000165

We can then subtract Method 1's total time and Method 2's total time to get:

  • loop time difference: 0.0002055 - 0.000165 = 0.0000405
  • total loop time difference: 0.0000405 * 2 = 0.000081

0.000081 is basically negligible. So the loops run nearly exactly the same speed but in the first method you iterate over the list twice so it takes twice as long.

Take Away: is_trunc(key) is expensive! What are you doing in there?? :)

Upvotes: 1

Related Questions