Reputation:
I wrote a test code to see how pypy can optimize python code well and run faster. It is a non-in-place quick sort and supposed to run slow enough to make the difference. By simply replacing python
with pypy
, the result is actually slower from 16 seconds to 25 seconds. I searched a bit and found the opt
option, but I can't find a way to apply it to pypy. I'm quite new to python, so help me a bit.
import sys
def sqsort(xxs):
if len(xxs) == 1 or len(xxs) == 0:
return xxs
x = xxs[0]
xs = xxs[1 :]
l = []
g = []
for x2 in xs:
if x2 < x:
l.append(x2)
if x2 >= x:
g.append(x2)
return sqsort(l) + [x] + sqsort(g)
sys.setrecursionlimit(30000)
l = list(reversed(range(15000)))
print(l)
print(sqsort(l))
Upvotes: 0
Views: 526
Reputation: 12900
Not a full answer, but indeed the problem is recursion, which PyPy is not good at. Here's the same algorithm rewritten to use recursion only for the shorter of the sublists (either l
or g
), and iteration for the longer one. This version is still recursive, but the recursion is guaranteed to be limited to O(log(n))
times instead of O(n)
. It is now 4-5x faster in PyPy.
Note that we can't say that the overall time of this algorithm (in either version) is really O(n log(n))
, because it is full of list concatenations which take time too. You can't treat Python's lists like you would treat Haskell's or Lisp's "cons" chained lists; in Python, lists are variable-sized array.
def sqsort(xxs):
left, right = [], []
while True:
if len(xxs) == 1 or len(xxs) == 0:
return left + xxs + right
x = xxs[0]
xs = xxs[1 :]
l = []
g = []
for x2 in xs:
if x2 < x:
l.append(x2)
if x2 >= x:
g.append(x2)
if len(l) <= len(g):
left += sqsort(l) + [x]
xxs = g
else:
right = [x] + sqsort(g) + right
xxs = l
l = list(reversed(range(15000)))
print(l)
print(sqsort(l))
Upvotes: 1