Prithvidiamond
Prithvidiamond

Reputation: 367

Why is this Python function to sort integers contained as strings slower than this?

I have two python functions for sorting a list of integers contained as strings. They are follows:

import random

n = 10000000
unsorted = [str(x) for x in range(n)]
random.shuffle(unsorted)

def sort_alg1(unsorted):
    unsorted.sort(key = lambda x: int(x))
    return unsorted

def sort_alg2(unsorted):
    l=list(map(int,unsorted))
    l.sort()
    s=list(map(str,l))
    return s

print(sort_alg1(unsorted))
print(sort_alg2(unsorted))

Both work as expected. However, according to my profiler (I am using the ever-popular line_profiler by Robert Kern), the first function i.e. sort_alg1 is more than 3 times slower in execution as sort_alg2. Now that wouldn't be a big problem if I could pinpoint the reason for this, but I am unable to. I have tried looking up differences in the built-in sort() method and map() function, lambda, etc. all to no avail. If someone could educate me as to why this is happening, that would be really great!

Upvotes: 1

Views: 75

Answers (2)

Booboo
Booboo

Reputation: 44148

unsorted.sort in function sort_alg1 sorts the list in place so that sort_alg2 is not starting out with quite the same version of the unsorted list. Moreover, once sort_alg2 executes the statement l = list(map(int, unsorted)), l is now a completely sorted list if we are sorting by integer values. So l.sort() is going to run in trivial time.

Note the assert statement in the sort_alg2 function:

import random

n = 10000000
unsorted = [str(x) for x in range(n)]
random.shuffle(unsorted)

def sort_alg1(unsorted):
    unsorted.sort(key = lambda x: int(x))
    return unsorted

def sort_alg2(unsorted):
    l=list(map(int,unsorted))
    # make a copy of l:
    l2 = l.copy()
    l.sort()
    assert(l == l2) # proof!
    s=list(map(str,l))
    return s

sort_alg1(unsorted)
sort_alg2(unsorted)

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195448

Doing some benchmark:

from timeit import timeit
import random

n = 1_000_000
unsorted = [str(x) for x in range(n)]
random.shuffle(unsorted)


def sort_alg1(unsorted):
    unsorted.sort(key=lambda x: int(x))
    return unsorted


def sort_alg2(unsorted):
    l = list(map(int, unsorted))
    l.sort()
    s = list(map(str, l))
    return s


t1 = timeit(lambda: sort_alg1(unsorted), number=1)
t2 = timeit(lambda: sort_alg2(unsorted), number=1)

print(t1)
print(t2)

Prints:

0.4568827738985419
0.2486396620515734

So it seems that sort_alg2 is faster. But the reason is that sort_alg2 receives already sorted array from sort_alg1. If you slightly change the benchmark:

t1 = timeit(lambda: sort_alg1(unsorted), number=1)
random.shuffle(unsorted)                      # <---- shufle again the array
t2 = timeit(lambda: sort_alg2(unsorted), number=1)

print(t1)
print(t2)

Prints:

0.456114097032696
0.5958397497888654

So first function is faster.

Upvotes: 1

Related Questions