Aneesh Dogra
Aneesh Dogra

Reputation: 740

Why is list element replacement slower than string element replacement in python?

I am trying to replace a set of elements in a data-structure to some other value. It seems that doing such kind of replacement is significantly faster in strings than in lists in case of python (as revealed by the benchmarking tests below). Can someone please explain why.

Note: These tests were performed on using python 2.7.

def string_replace_test(s, chars):
    """Replaces a set of chars to 0"""
    new = s
    for c in chars:
        new = new.replace(c, '0')
    return new

def list_replace_test(s, chars):
    """Replaces a set of chars to 0"""
    for a in xrange(len(s)):
        if s[a] in chars:
            s[a] = '0'

if __name__ == '__main__':
    import timeit
    s = """
        Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec
        etfringilla purus. Pellentesque bibendum urna at neque consectetur
        at tincidunt nulla luctus. Pellentesque augue lacus, interdum id
        lectus vitae, laoreet suscipit arcu.
        """
    s2 = list(s)
    chars = ['a', 'e', 'i', 'o', 'u']
    print(timeit.timeit("string_replace_test(s, chars)", setup="from __main__ import string_replace_test, s, chars"))
    print(timeit.timeit("list_replace_test(s2, chars)", setup="from __main__ import list_replace_test, s2, chars"))

Output:

5.09572291374
49.3243050575

Using range():

5.01253795624
53.2320859432

Upvotes: 1

Views: 476

Answers (4)

Lennart Regebro
Lennart Regebro

Reputation: 172269

There are several reasons for the difference in speed here. The main one is that in your first example, you make five calls to a function:

for c in ['a', 'e', 'i', 'o', 'u']:
    new = new.replace(c, '0')

In the second case, you however loop over the string, which is 259 characters long, and make at least on call (s[a] in chars ends up being a call) for each of these:

if s[a] in chars:
    s[a] = '0'

Already there, we have 51 times as many calls, and calls are not cheap. The check to see if the character should be replaced is much faster than the replace function you call, but this is one significant reason that the latter function is slow.

Upvotes: 2

Bakuriu
Bakuriu

Reputation: 101969

The difference is mostly due to the fact that str.replace is a method implemented in C, which can iterate much faster over the string. Also it can use simpler comparisons(using simple C functions) instead of calls to python methods.

You can see the huge different pretty easily:

In [3]: s = """
   ...:      Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec
   ...:      etfringilla purus. Pellentesque bibendum urna at neque consectetur
   ...:      at tincidunt nulla luctus. Pellentesque augue lacus, interdum id
   ...:      lectus vitae, laoreet suscipit arcu.
   ...:      """

In [4]: s2 = list(s)

In [5]: %%timeit
   ...: s.replace('a', '0')
   ...: 
1000000 loops, best of 3: 545 ns per loop

In [6]: %%timeit
   ...: for i, el in enumerate(s2):
   ...:     if el == 'a':
   ...:         s2[i] = '0'
   ...: 
100000 loops, best of 3: 17.9 us per loop
In [7]: 17.9 * 1000 / 545
Out[7]: 32.84403669724771

As you can see str.replace runs 33 times faster than the pure python loop. Even though your list code should be faster when you want to replace many vowels(in particular if you use a set instead of a list as chars parameter), the number of characters to replace must be big in order to make the code efficient enough.

For example:

In [14]: %%timeit
    ...: for i, el in enumerate(s2):
    ...:     if el in 'abcdefghijklmnopqrstuvwxyz':
    ...:         s2[i] = '0'
    ...: 
100000 loops, best of 3: 16.4 us per loop

Note that the timing is almost the same as before, while:

In [17]: %%timeit S = s
    ...: for c in 'abcdefghijklmnopqrstuvwxyz':
    ...:     S = S.replace(c, '0')
    ...: 
100000 loops, best of 3: 5.63 us per loop

Is still faster, but the timing increased by 10x.

In fact the fastest way to change some characters from a string is to use the translate method, which allows to do multiple replaces with a single call:

In [1]: import string

In [2]: table = string.maketrans('aeiou', '00000')

In [3]: s = """
   ...:      Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec
   ...:      etfringilla purus. Pellentesque bibendum urna at neque consectetur
   ...:      at tincidunt nulla luctus. Pellentesque augue lacus, interdum id
   ...:      lectus vitae, laoreet suscipit arcu.
   ...:      """

In [4]: %timeit s.translate(table)
1000000 loops, best of 3: 557 ns per loop

Note that it takes the same time as a single str.replace but it is doing all the replacements in a single pass, like the code you have for the list.

Note that in python3 str.translate will be significantly slower than in python2, especially if you translate only few characters. This is because it must handle unicode characters and thus uses a dict to perform the translations instead of indexing a string.

Upvotes: 5

Martijn Pieters
Martijn Pieters

Reputation: 1122322

Since there is no list.replace() function, you built your own, but picked a slow method.

Use a list comprehension instead:

def list_replace_test(s, chars):
    """Replaces a set of chars to 0"""
    return [a if a not in chars else '0' for a in s]

This will still be slower than string replacement because you cannot avoid the Python loop here.

Using a set for chars helps:

chars = set(chars)

but the fastest way to replace individual characters in text is actually a different technique altogether. Use str.translate() for that:

from string import maketrans

map = maketrans('aeiou', '0' * 5)
def str_translate(s, map):
    return s.translate(map)

With these changes the timings become:

>>> timeit.timeit("list_replace_test(s2, chars)", setup="from __main__ import list_replace_test, s2, chars")
28.60542392730713
>>> timeit.timeit("string_replace_test(s, chars)", setup="from __main__ import string_replace_test, s, chars")
4.002871990203857
>>> timeit.timeit("str_translate(s, map)", setup="from __main__ import str_translate, s, map")
0.7250571250915527

blowing the looped str.replace() calls right out of the water too.

Upvotes: 5

Notice that your second does not only test the string.replace. It is different algorithm altogether, and you can find some speedup by replacing chars with set(chars):

def list_replace_set_test(s, chars):
    """Replaces a set of chars to 0"""
    chars = set(chars)
    for a in xrange(len(s)):
        if s[a] in chars:
            s[a] = '0'

Upvotes: 1

Related Questions