Reputation: 2483
I am trying to convert my perl one-liners to pyp. My first attempt was given to me kindly as the answer to another question as
pyp "mm | p if n==0 else (p[:-2] + [(int(x)%12) for x in p[-2:]]) | mm"
However this turns out to be amazingly slow. If I create a test file using
for j in xrange(50000):
print ",".join(str(i) for i in [random.choice(xrange(1000)) for i in xrange(8)])
and then run
time (cat testmedium.txt |~/.local/bin/pyp "mm | p if n==0 else (p[:-2] + [(int(x)%12) for x in p[-2:]]) | mm" > /dev/null)
I get
real 1m27.889s
user 1m26.941s
sys 0m0.688s
However the equivalent in perl is almost instant.
time (cat testmedium.txt |perl -l -a -F',' -p -e'if ($. > 1) { $F[6] %=12; $F[7] %= 12;$_ = join(q{,}, @F[6,7]) }' > /dev/null)
real 0m0.196s
user 0m0.192s
sys 0m0.012s
For larger test files the difference is even more dramatic.
Upvotes: 0
Views: 674
Reputation: 577
This is an indirect answer to your question @marshall.
First, I would say that for me, the biggest advantage of pyp is not having to learn another language and I don't generally deal with large amounts of data, so its a good fit for my needs. Also, I understand that there have also been some speed orientated optimizations to pyp which may have affected the problem you describe.
I wondered if pypy might provide a faster version of pyp so I created an alias for pyp:
alias 'pl=pypy /usr/bin/pyp'
Then I ran this command with both pyp and pl
lr | pl "'doc',p, p.replace('e','EEE')+'.xpg' | pp.reverse() | ''.join(p)" | pl "d|u"
where lr is an alias for ls -R + ls -A just to create a long recursive list to time the operation.
The results were 8.04 seconds for pyp using Python 2.7.6 and 4.46 seconds for the pl alias. For a much larger set of directories it was 470 and 250 seconds. Python runs at 100% of one core during this operation as does PyPy.
So if you have pypy on your system there would seem to be a substantial performance gain possible with a simple alias.
Upvotes: 0
Reputation: 526573
This code...
import sys
for index,line in enumerate(sys.stdin):
if index == 0:
print line
else:
values = line.split(',')
values[-2:] = [str(int(x)%12) for x in values[-2:]]
print ','.join(values)
runs in under a second for me (using a test file generated with the same method you did):
$ time (cat test.txt | python foo.py > /dev/null)
real 0m0.363s
user 0m0.339s
sys 0m0.032s
So if you're running into issues, it's probably an inefficiency with something pyp
is trying to do.
Upvotes: 4