caot
caot

Reputation: 3328

python split by character vs default

python library function namedtuple from collections referring https://github.com/python/cpython/blob/master/Lib/collections/init.py

def namedtuple(typename, field_names, *, verbose=False, rename=False, module=None):

    # Validate the field names.  At the user's option, either generate an error
    # message or automatically replace the field name with a valid name.
    if isinstance(field_names, str):
        field_names = field_names.replace(',', ' ').split()

The last line of code above has replace(',', ' ').split() other than split(','). I'm wondering what's the reason for it.

Here is the test code to measure the time cost:

from random import randrange


def create_str(n):
    a = []
    for _i in range(n):
        a.append(str(randrange(101)))

    return ','.join(a)


s = create_str(1000)

# print(s)


def test_a():
    s.split(',')


def test_b():
    s.replace(',', ' ').split()


if __name__ == '__main__':
    import timeit
    print(['test_a: ', timeit.timeit("test_a()", setup="from __main__ import test_a")])
    print(['test_b: ', timeit.timeit("test_b()", setup="from __main__ import test_b")])

The output from the above:

['test_a: ', 59.938546671997756]
['test_b: ', 68.51630863297032]

s = create_str(10) got the follows:

['test_a: ', 0.9246872899821028]
['test_b: ', 1.2178910280345008]

s = create_str(100) got the follows:

['test_a: ', 6.570624853018671]
['test_b: ', 7.8685859580291435]

test_b is faster anyway.

Updated:

https://docs.python.org/3/library/collections.html#collections.namedtuple mentioned the follows:

The field_names are a sequence of strings such as ['x', 'y']. Alternatively, field_names can be a single string with each fieldname separated by whitespace and/or commas, for example 'x y' or 'x, y'.

Upvotes: 0

Views: 55

Answers (1)

Kendas
Kendas

Reputation: 2243

Execution time difference aside, these two do not exactly do the same thing.

Consider a string 'a, b, c'. Using the replace + split, it would result in ['a', 'b', 'c'] while splitting on ',' would result in ['a', ' b', ' c'].

Asking whether the one or the other option is faster or slower is largely irrelevant since these operations (I mean using namedtuple()) are generally done at import time.

So unless you are generating new namedtuple types at runtime using dynamically generated string (not list) field names in a tight loop, the time difference is trivial.

Upvotes: 2

Related Questions