subhanshu kumar
subhanshu kumar

Reputation: 392

logic behind sorted() function in python

I have an array of 2k elements and i am trying to sort it.

When i call sorted(arr) it takes very less time, i.e 3-4 sec to sort an array.

but when i am building my own sorting function(merge sort) with time complaxity nlogn it is taking more then a min to sort it.

I also tried with quick sort and bubble sort but none of the algo is able to reduce the time to this extent.

I am not able to understand that how builtin function works and how it is taking so less time. what is the logic behind it?

Upvotes: 0

Views: 2131

Answers (2)

Stef
Stef

Reputation: 15505

The complexity you know is asymptotic, and there is a multiplicative constant

Python code is optimized by people who know python's implementation very well. The fact that merge sort's complexity is proportional to n log n will be apparent if you time your function on a list of 1000 elements, a list of 10000, a list of 10^5, a list of 10^6, and plot the resulting times on a graph. The curve should be close to the curve of k n log n for some constant k. Note that I said k n log n and not just n log n. There is a multiplicative constant.

An analogy: reading pages from books

Let me give you a simple example of two algorithms which "theoretically" have exactly the same complexity, yet one is obviously much faster than the other. I have 100 books on a shelf in front of me; every book has 100 pages; I want to know if the sentence "She sells seashells on the shore" is written in one of these books.

Algorithm 1

for p in range(100):
    for b in range(100):
        read page p in book b

Algorithm 2

for b in range(100):
    for p in range(100):
        read page p in book b

A simple theoretical, abstract analysis of the complexities of these two algorithms is clear: they both require reading a total of 10000 pages. None of these two programs is obviously faster than the other. Do you agree? Yet any human asked to read pages from book will tell you it is much, much, much faster to read them using algorithm 2.

Merge sort, list slices and copy

The same thing goes for sorting lists. Python's implementation of the sorted() function takes into account lots of small details about the python language that make some operations faster than others. Your implementation of merge sort makes no such optimization and is much slower. Here is the code of a slow merge sort:

def mergesort(l):
  if len(l) > 1:
    middle = len(l) // 2 + 1
    left = mergesort(l[:middle])
    right = mergesort(l[middle:])
    l = merge(left, right)
  return l

Can you tell why this would be slow? The slicing operations l[:middle] and l[middle:] are copy operations. We copy all the elements of l into two new lists and we pass these lists to the recursive calls. An optimized version of merge sort would avoid the copy operation, which wastes a huge amount of time.

Quicksort in practical implementations

You mention quicksort. Quicksort is particularly interesting. Have you tried to sort the cards of a deck of playing cards? Do you use something similar to quicksort or mergesort to do that? If you try, you'll find out that quicksort and mergesort's "divide and conquer" approaches are very well worth it when you have lots of cards; but if you have 20 cards or less, insertion sort is actually faster. Why? Because quicksort is complicated. You waste time setting up the cards correctly. We call this time "overhead". Quicksort and merge sort are proportional to n log n, but they have a high overhead. This overhead is worth it for large lists, but not worth it for small lists. But wait a minute!! Because of quicksort's recursive approach, if the initial list is large, then quicksort will divide it into lots of small lists! And then it'll waste time on every small list!

In some languages, the sorting function is implemented using a variant of quicksort. But you'll find out that in every such implementation, the algorithm stops using quicksort as soon as the list is smaller than a certain size. When the sublists in the recursive calls become small enough, a simpler sort is used instead.

The implementation of sorted in CPython

If you still want to know precisely what is the difference between your sorting functions and python's sorting functions, you can read about it.

The sorted function in python is implemented using a sorting algorithm called Timsort. You can find a description of this algorithm on Wikipedia, and the actual implementation in the source code of CPython, which is open source.

Upvotes: 1

Sniggerfardimungus
Sniggerfardimungus

Reputation: 11831

Python's sorted () is written in C and compiled to native code. When you write your own sort, it's run entirely in the python interpreter, which is at least an order of magnitude slower - nearly two. It's hard to say without seeing your code, but there's also a chance that your Python sort is doing a bunch of allocation behind the scenes...

Upvotes: 2

Related Questions