python split by character vs default

Question

python library function namedtuple from collections referring https://github.com/python/cpython/blob/master/Lib/collections/init.py

def namedtuple(typename, field_names, *, verbose=False, rename=False, module=None):

    # Validate the field names.  At the user's option, either generate an error
    # message or automatically replace the field name with a valid name.
    if isinstance(field_names, str):
        field_names = field_names.replace(',', ' ').split()

The last line of code above has replace(',', ' ').split() other than split(','). I'm wondering what's the reason for it.

Here is the test code to measure the time cost:

from random import randrange


def create_str(n):
    a = []
    for _i in range(n):
        a.append(str(randrange(101)))

    return ','.join(a)


s = create_str(1000)

# print(s)


def test_a():
    s.split(',')


def test_b():
    s.replace(',', ' ').split()


if __name__ == '__main__':
    import timeit
    print(['test_a: ', timeit.timeit("test_a()", setup="from __main__ import test_a")])
    print(['test_b: ', timeit.timeit("test_b()", setup="from __main__ import test_b")])

The output from the above:

['test_a: ', 59.938546671997756]
['test_b: ', 68.51630863297032]

s = create_str(10) got the follows:

['test_a: ', 0.9246872899821028]
['test_b: ', 1.2178910280345008]

s = create_str(100) got the follows:

['test_a: ', 6.570624853018671]
['test_b: ', 7.8685859580291435]

test_b is faster anyway.

Updated:

https://docs.python.org/3/library/collections.html#collections.namedtuple mentioned the follows:

The field_names are a sequence of strings such as ['x', 'y']. Alternatively, field_names can be a single string with each fieldname separated by whitespace and/or commas, for example 'x y' or 'x, y'.

python split by character vs default

Updated:

Answers (1)

Related Questions