Reputation: 73
I am writing python2.7.15 code to access chars inside a word. How can I optimize this process, in order to check also if every word is contained inside an external list?
I have tried two versions of python2 code: version(1) is an extended version of what my code has to do, whereas in version (2) I tried a compact version of the same code.
chars_array = ['a','b','c']
VERSION (1)
def version1(word):
chars =[x for x in word]
count = 0
for c in chars:
if not c in chars_array:
count+=1
return count
VERSION (2)
def version2(word):
return sum([1 for c in [x for x in word] if not c in chars_array])
I am analyzing a large corpus and for version1 I obtain an execution time of 8.56 sec, whereas for version2 it is 8.12 sec.
Upvotes: 2
Views: 73
Reputation: 15349
With chars_array = ['a', 'e', 'i', 'o', 'u', 'y']
and words
equal to a list
of 56048 English words, I measured a number of variants with a command similar to the following at an IPython prompt:
%timeit n = [version1(word) for word in words]
In each case it reported "10 loops, best of 3", and I have shown the time per loop in comments next to each function definition below:
# OP's originals:
def version1(word): # 163 ms
chars =[x for x in word]
count = 0
for c in chars:
if not c in chars_array:
count+=1
return count
def version2(word): # 173 ms
return sum([1 for c in [x for x in word] if not c in chars_array])
Now let's hit version1
and version2
with three optimizations:
word
directly instead;not in
rather than negating the result of the in
operator;set
rather than a list
._
chars_set = set(chars_array)
def version1a(word): # 95.5 ms
count = 0
for c in word:
if c not in chars_set:
count+=1
return count
def version2a(word): # 104 ms
return sum([1 for c in word if c not in chars_set])
So there's actually an advantage for the multi-line code over the list comprehension. This may depend on word length, though: version2a
has to allocate a new list the same length as the word, whereas version1a
does not. Let's refine version2a
further to give it that same advantage, by summing over a generator expression rather than a list comprehension:
def version2b(word): # 111 ms
return sum(1 for c in word if c not in chars_set)
To my surprise that was actually slightly counterproductive—but again, that effect may depend on word length.
Finally let's experience the power of .translate()
:
chars_str = ''.join(chars_set)
def version3(word): # 40.7 ms
return len(word.translate(None, chars_str))
We have a clear winner.
Upvotes: 1
Reputation: 16593
The fastest solution (can be up to 100x faster for an extremely long string):
joined = ''.join(chars_array)
def version3(word):
return len(word.translate(None, joined))
Another slower solution that is approximately the same speed as your code:
from itertools import ifilterfalse
def version4(word):
return sum(1 for _ in ifilterfalse(set(chars_array).__contains__, word))
Timings (s
is a random string):
In [17]: %timeit version1(s)
1000 loops, best of 3: 79.9 µs per loop
In [18]: %timeit version2(s)
10000 loops, best of 3: 98.1 µs per loop
In [19]: %timeit version3(s)
100000 loops, best of 3: 4.12 µs per loop # <- fastest
In [20]: %timeit version4(s)
10000 loops, best of 3: 84.3 µs per loop
Upvotes: 2