Reputation: 2945
I am trying to encode a small lambda calculus with algebraic datatypes in Scheme. I want it to use lazy evaluation, for which I tried to use the primitives delay
and force
. However, this has a large negative impact on the performance of evaluation: the execution time on a small test case goes up by a factor of 20x.
While I did not expect laziness to speed up this particular test case, I did not expect a huge slowdown either. My question is thus: What is causing this huge overhead with lazy evaluation, and how can I avoid this problem while still getting lazy evaluation? I would already be happy to get within 2x the execution time of the strict version, but faster is of course always better.
Below are the strict and lazy versions of the test case I used. The test deals with natural numbers in unary notation: it constructs a sequence of 2^24
suc
s followed by a zero
and then destructs the result again. The lazy version was constructed from the strict version by adding delay
and force
in appropriate places, and adding let
-bindings to avoid forcing an argument more than once. (I also tried a version where zero
and suc
were strict but other functions were lazy, but this was even slower than the fully lazy version so I omitted it here.)
I compiled both programs using compile-file
in Chez Scheme 9.5 and executed the resulting .so
files with petite --program
. Execution time (user only) for the strict version was 0.578s, while the lazy version takes 11,891s, which is almost exactly 20x slower.
(define zero 'zero)
(define (suc x) (cons 'suc x))
(define one (suc zero))
(define two (suc one))
(define three (suc two))
(define (twice m)
(if (eq? m zero)
zero
(suc (suc (twice (cdr m))))))
(define (pow2 m)
(if (eq? m zero)
one
(twice (pow2 (cdr m)))))
(define (consume m)
(if (eq? m zero)
zero
(consume (cdr m))))
(consume (pow2 (twice (twice (twice three)))))
(define zero (delay 'zero))
(define (suc x) (delay (cons 'suc x)))
(define one (suc zero))
(define two (suc one))
(define three (suc two))
(define (twice m)
(delay
(let ((mv (force m)))
(if (eq? mv 'zero)
(force zero)
(force (suc (suc (twice (cdr mv)))))))))
(define (pow2 m)
(delay
(let ((mv (force m)))
(if (eq? mv 'zero)
(force one)
(force (twice (pow2 (cdr mv))))))))
(define (consume m)
(delay
(let ((mv (force m)))
(if (eq? mv 'zero)
(force zero)
(force (consume (cdr mv)))))))
(force (consume (pow2 (twice (twice (twice three))))))
Upvotes: 8
Views: 943
Reputation: 820
One can see statistics for the two phases of the test program using ChezScheme's (time ...)
procedure:
$ scheme
Chez Scheme Version 9.5.2
> (load-program "strict.ss")
(time (pow2 (twice (...))))
21 collections
0.695561822s elapsed cpu time, including 0.521065634s collecting
0.695607000s elapsed real time, including 0.521191000s collecting
672586992 bytes allocated, including 236483824 bytes reclaimed
(time (consume u2^24))
no collections
0.037766347s elapsed cpu time
0.037762000s elapsed real time
0 bytes allocated
and for the lazy version:
$ scheme
> (load-program "lazy.ss")
(time (pow2 (twice (...))))
no collections
0.000000000s elapsed cpu time
0.000000000s elapsed real time
400 bytes allocated
(time (force (consume u2^24)))
572 collections
11.997971385s elapsed cpu time, including 10.798406971s collecting
12.012723000s elapsed real time, including 10.813517000s collecting
4832215216 bytes allocated, including 4460306000 bytes reclaimed
So 90% of time is collecting. Adjusting collector parameters may improve this, eg:
(collect-trip-bytes 1000000)
(collect-generation-radix (greatest-fixnum))
(heap-reserve-ratio 2.0)
(these values halve lazy time OMM)
One might also replace ChezScheme's delay
and force
with stripped down versions:
(import (except (chezscheme) delay force))
(define (make-promise p)
(let ([value (box p)])
(lambda ()
(when (box? value)
(let ([x ((unbox value))])
(when (box? value)
(set! value x))))
value)))
(define-syntax delay
(syntax-rules ()
[(_ expr) (make-promise (lambda () expr))]))
(define (force promise)
(promise))
(add above to the beginning of lazy.ss)
NB these have no error checking, and don't handle multiple values or lazy boxes.
(The ChezScheme implementation is here)
With these changes the lazy version is about 4x slower than strict:
$ scheme
> (load-program "lazy.ss")
(time (pow2 (twice (...))))
no collections
0.000000000s elapsed cpu time
0.000000000s elapsed real time
336 bytes allocated
(time (force (consume u2^24)))
3813 collections
2.977003428s elapsed cpu time, including 2.175818398s collecting
2.977292000s elapsed real time, including 2.179504000s collecting
4029652320 bytes allocated, including 2414247968 bytes reclaimed
Upvotes: 1
Reputation: 15793
What you are trying to do is not encode a small lambda calculus with algebraic datatypes
but you try to encode the Peano arithmetics, which is the first step up to "a small lambda".
I tried to write you some code that does it in a "faster way". Because I do not use the special forms force
and delay
in my code I used instead thunks to encode their logic.
(define succ
(lambda (x)
(lambda ()
(cons 'succ x))))
(define zero (lambda () 'zero))
(define one (succ zero))
(define two (succ one))
(define three (succ two))
(define twice
(lambda (n)
(define twice
(lambda (k)
(if (eq? 'zero k)
n
(succ (twice ((cdr k)))))))
(twice (n))))
(define pow2
(lambda (n)
(if (eq? 'zero n)
one
(twice (pow2 ((cdr n)))))))
(define print10
(lambda (n)
(define toten
(lambda (n)
(if (eq? n 'zero)
0
(+ 1 (toten ((cdr n)))))))
(display (toten (n)))
(newline))))
(print10 zero)
(print10 one)
(print10 two)
(print10 three)
(print10 (twice three))
(print10 (pow2 (zero)))
(print10 (pow2 (one)))
(print10 (pow2 (two)))
(print10 (pow2 (three)))
A test session should look so:
% mit-scheme --silent <peano.scm
0
1
2
3
6
1
2
4
8
Upvotes: 0
Reputation: 1732
This sounds very like a problem that crops up in Haskell from time to time. The problem is one of garbage collection.
There are two ways that this can go. Firstly, the lazy list can be consumed as it is used, so that the amount of memory consumed is limited. Or, secondly, the lazy list can be evaluated in a way that it remains in memory all of the time, with one end of the list pinned in place because it is still being used - the garbage collector objects to this and spends a lot of time trying to deal with this situation.
Haskell can be as fast as C, but requires the calculation to be strict for this to be possible.
I don't entirely understand the code, but it appears to be recursively creating a longer and longer list, which is then evaluated. Do you have the tools to measure the amount of memory that the garbage collector is having to deal with, and how much time the garbage collector runs for?
Upvotes: 0