kleite
kleite

Reputation: 337

Speedwise, how does Pharo compare to Python3?

I'm not much experienced in programming, but I know a little Python3, and now I'm taking my first baby steps to learn Pharo. I'm still not familiar with object-oriented programming, or the class browser, but I already went though ProfStef tutorial and I'm toying with small programs on Playground, to get familiar with the syntax.

One of the first things I was curious about, was how the two languages compare in terms of speed, as I read somewhere that Pharo has a JIT compiler built in. So I wrote a little whimsical script in both languages, that generates 8 million numbers, filters 1/3 of them, calculates 1/sqrt(x) for each, sum the results, and repeats the process one hundred times, changing the intervals slightly every time, summing the results again in the end, and timed the whole process. Not a proper benchmark, just a exercise to get a order of magnitude estimate, but I tried to keep both versions as similar as possible.

Python 3 version:

import time, math

mega = lambda n: sum([1/math.sqrt(1 + i + n) for i in range(8000000) if (i + 1) // 3 == 0])
start = time.time()
print(sum([mega(n + 1) for n in range(100)]))
stop = time.time() - start
print(stop)

Results with Python 3.8.5 (default, Jul 28 2020, 12:59:40):

34.7701230607214
52.75216603279114

Pharo 8 version:

| mega range start stop |.

range := [:n | (((1 to: 8000000) select: [:j | (j quo: 3) = 0]) collect: [:i | 1 / (n + i) sqrt]) sum].
start := DateAndTime  now.
Transcript show: (((1 to: 100) collect: [:n | range value: n]) sum); cr.
stop := (DateAndTime now - start) asSeconds.
Transcript show: stop; cr.

Results on Pharo-8.0.0+build.1141.sha.1b7a8d8203fce2a57794451f555bba4222614081 (64 Bit):

34.7701230607214
45

As I expected, the Pharo version ran faster, but not by a large margin, 45 seconds against a bit over 52 seconds for Python. It took about 13% less time. So I guess their speed are about the same order of magnitude. Is this the typical situation?

Upvotes: 3

Views: 1240

Answers (3)

andrzej
andrzej

Reputation: 1

My results four years later:

Python 3.12.1

34.770123060721424
49.19317078590393

Pharo 12.0.0

34.7701230607214
30

Squeak6.0

34.7701230607214
22

and another player.. Racket v8.15

34.770123060721424
2.4

#lang racket/base
(require math/base)

(define mega (lambda (n) 
           (sum (for/list ([i (in-range 8000000)]
                   #:when (zero? (quotient (+ i 1) 3)))
              (/ 1 (sqrt (+ 1 i n)))))))

(define start (current-milliseconds))

(exact->inexact (sum (for/list ([n (in-range 100)])
               (mega (+ n 1)))))

(displayln (exact->inexact (/ (- (current-milliseconds) start) 1000)))

Upvotes: -1

Stephan Eggermont
Stephan Eggermont

Reputation: 15907

While still not so idiomatically smalltalk,

| range start stop |.

range := [ :n | (1 to: 8000000) inject: 0 into: [:sum :in | 
    ((in quo: 3) = 0 )
        ifTrue: [1 / ( n + in) sqrt + sum] 
        ifFalse: [sum]]].
start := DateAndTime  now.
Transcript show: (((1 to: 100) collect: [:n | range value: n]) sum); cr.
stop := (DateAndTime now - start) asSeconds.
Transcript show: stop; cr.

is 2.5 times faster. Which kind of underscores the point made by Leandro.

Upvotes: 3

Leandro Caniglia
Leandro Caniglia

Reputation: 14858

This kind of test doesn't tell much. The main reason is that the bulk of the computation consists in repeatedly sending the same messages to instances of the same classes (quo: and = for the select: and /, +, sqrt in the collect:, etc.). This means that a time-consuming (internal) operation such as the method lookup, only takes place once and then gets trapped in the inline caches. As a result, you may have a system that outperforms another when running these benchmarks and is much slower when running a "real" application. Besides (mono or polymorphic) inline caches, which reduce the need for method-lookup, other techniques that make the difference are the performance of the Garbage Collector, method inlining (which replaces send sites with a copy of the target code), register allocation (for minimizing memory access), performance of the become: message, etc. The multitude of factors makes it desirable to measure more sophisticated code snippets, trying to exercise known bottlenecks as the ones I just mentioned. Sometimes a small change may expose otherwise hidden strengths (or weaknesses) of your system. So my suggestion is that for doing this kind of analysis you should try a bit harder and design tests aimed at measuring how the system responds to a particular kind of stress.

Upvotes: 8

Related Questions