abadawi
abadawi

Reputation: 418

list.count() vs Counter() performance

While trying to find the frequency of a bunch of characters in a string, why does running string.count(character) 4 times for 4 different characters yield faster execution time (using time.time()) than using a collections.Counter(string)?

Background: Given a sequence of moves represented by a string. Valid moves are R (right), L (left), U (up), and D (down). Return True if the sequence of moves takes me back to the origin. Otherwise, return false.


# approach - 1 : iterate 4 times (3.9*10^-6 seconds)
def foo1(moves):
    return moves.count('U') == moves.count('D') and moves.count('L') == moves.count('R')

# approach - 2 iterate once (3.9*10^-5 seconds)
def foo2(moves): 
    from collections import Counter
    d = Counter(moves)
    return d['R'] == d['L'] and d['U'] == d['D']

import time
start = time.time()
moves = "LDRRLRUULRLRLRLRLRLRLRLRLRLRL"
foo1(moves)
# foo2(moves)
end = time.time()
print("--- %s seconds ---" % (end - start))

These results are the opposite of what I had expected. My reasoning is that first approach should take longer because the string is iterated over 4 times whereas in the second approach, we iterate only once. Could it be due to the library call overhead?

Upvotes: 4

Views: 4134

Answers (1)

ShadowRanger
ShadowRanger

Reputation: 155536

Counter is faster in theory, but has higher fixed overhead, especially compared to str.count, which can scan the underlying C array with direct memory comparisons, where list.count has to do rich comparisons for each element; converting moves to a list of single characters nearly triples the time for foo1 in local tests, from 448 ns to 1.3 μs (while foo2 actually gets a tiny bit faster, dropping from 5.6 μs to 5.48 μs).

Other problems:

  1. Importing an already imported module uses the cached import, but there is a surprising amount of overhead involved in even a cached import (the loading machinery has a lot of stuff to check to make sure it's okay to do so); in local tests, moving from collections import Counter to the top level reduced the runtime of foo2 by 1.6 μs (5.6 μs with single global import, 7.2 μs with local per-call import). This will vary a lot by environment; on another machine (with less stuff installed in both user and system site-packages), the overhead was only 0.75 μs. Regardless, it's a significant, avoidable disadvantage for foo2.
  2. Counter on modern Python uses a C accelerator to speed up counting, but the accelerator only provides a benefit when the iterable is long enough. If you use the list form of moves, but multiply it by 100 to make a longer sequence, the difference drops, relatively speaking (to 106 µs for foo1 vs. 140 µs for foo2)
  3. You're just not counting very many things; when there are only four things you care about, paying O(n) four times can easily beat paying O(n) once if the former case has lower constant multipliers (which aren't included in big-O notation) than the latter. Counter remains O(n) for any number of unique things being counted; calling .count is O(n) per call, but if you need to know the count of every unique thing in the input, for inputs that are mostly unique, individual .count calls for each will be asymptotically O(n²).
  4. The .count approach is short-circuiting in your specific case, so it isn't even doing O(n) work four times, just twice; the U and D counts don't match, so it never counts L and R at all. Counter doesn't get meaningfully slower if it can't short-circuit (all the cost is paid in the single counting pass), but your foo1, in the same benchmark I used from point #2 (longer input, in list form), goes from 106 µs to 185 µs if I just add a single D to the end of the (pre-multiplication) moves (making the U and D counts the same, and requiring two more count calls); foo2 only goes up to 143 µs (from 140 µs), presumably because moves actually got longer (adding the D before multiplying by 100 meant it went from 2900 elements to count to 3000).

Basically, you had some minor implementation weaknesses, but mostly, you happened to choose a use case that gave all the advantage to .count, none to Counter. If your inputs are always str, and you're only counting them a small, fixed number of times, then sure, repeated calls to count are generally going to win. But for arbitrary input types (especially iterators, where count is impossible, both because it doesn't exist, and because you can only iterate it once), especially larger ones, with more unique things to count, where consistent performance counts (so relying on short-circuiting to reduce the number of count calls isn't acceptable), Counter will win.

Upvotes: 8

Related Questions