marshall
marshall

Reputation: 2483

Why is pyp (python) one-liner so slow?

I am trying to convert my perl one-liners to pyp. My first attempt was given to me kindly as the answer to another question as

pyp "mm | p if n==0 else (p[:-2] + [(int(x)%12) for x in p[-2:]]) | mm"

However this turns out to be amazingly slow. If I create a test file using

for j in xrange(50000):
    print ",".join(str(i) for i in [random.choice(xrange(1000)) for i in xrange(8)])

and then run

time (cat testmedium.txt |~/.local/bin/pyp "mm | p if n==0 else (p[:-2] + [(int(x)%12) for x in p[-2:]]) | mm" > /dev/null)

I get

real    1m27.889s
user    1m26.941s
sys 0m0.688s

However the equivalent in perl is almost instant.

time (cat testmedium.txt |perl -l -a -F',' -p -e'if ($. > 1) { $F[6] %=12; $F[7] %= 12;$_ = join(q{,}, @F[6,7]) }' > /dev/null)

real    0m0.196s
user    0m0.192s
sys 0m0.012s

For larger test files the difference is even more dramatic.

Upvotes: 0

Views: 674

Answers (2)

John 9631
John 9631

Reputation: 577

This is an indirect answer to your question @marshall.

First, I would say that for me, the biggest advantage of pyp is not having to learn another language and I don't generally deal with large amounts of data, so its a good fit for my needs. Also, I understand that there have also been some speed orientated optimizations to pyp which may have affected the problem you describe.

I wondered if pypy might provide a faster version of pyp so I created an alias for pyp:

alias 'pl=pypy /usr/bin/pyp'

Then I ran this command with both pyp and pl

lr | pl "'doc',p, p.replace('e','EEE')+'.xpg' | pp.reverse() | ''.join(p)" | pl "d|u"

where lr is an alias for ls -R + ls -A just to create a long recursive list to time the operation.

The results were 8.04 seconds for pyp using Python 2.7.6 and 4.46 seconds for the pl alias. For a much larger set of directories it was 470 and 250 seconds. Python runs at 100% of one core during this operation as does PyPy.

So if you have pypy on your system there would seem to be a substantial performance gain possible with a simple alias.

Upvotes: 0

Amber
Amber

Reputation: 526573

This code...

import sys

for index,line in enumerate(sys.stdin):
    if index == 0:
        print line
    else:
        values = line.split(',')
        values[-2:] = [str(int(x)%12) for x in values[-2:]]
        print ','.join(values)

runs in under a second for me (using a test file generated with the same method you did):

$ time (cat test.txt | python foo.py > /dev/null)

real    0m0.363s
user    0m0.339s
sys     0m0.032s

So if you're running into issues, it's probably an inefficiency with something pyp is trying to do.

Upvotes: 4

Related Questions