Jesper
Jesper

Reputation: 2945

How to use of laziness in Scheme efficiently?

I am trying to encode a small lambda calculus with algebraic datatypes in Scheme. I want it to use lazy evaluation, for which I tried to use the primitives delay and force. However, this has a large negative impact on the performance of evaluation: the execution time on a small test case goes up by a factor of 20x.

While I did not expect laziness to speed up this particular test case, I did not expect a huge slowdown either. My question is thus: What is causing this huge overhead with lazy evaluation, and how can I avoid this problem while still getting lazy evaluation? I would already be happy to get within 2x the execution time of the strict version, but faster is of course always better.

Below are the strict and lazy versions of the test case I used. The test deals with natural numbers in unary notation: it constructs a sequence of 2^24 sucs followed by a zero and then destructs the result again. The lazy version was constructed from the strict version by adding delay and force in appropriate places, and adding let-bindings to avoid forcing an argument more than once. (I also tried a version where zero and suc were strict but other functions were lazy, but this was even slower than the fully lazy version so I omitted it here.)

I compiled both programs using compile-file in Chez Scheme 9.5 and executed the resulting .so files with petite --program. Execution time (user only) for the strict version was 0.578s, while the lazy version takes 11,891s, which is almost exactly 20x slower.

Strict version

(define zero    'zero)
(define (suc x) (cons 'suc x))

(define one   (suc zero))
(define two   (suc one))
(define three (suc two))

(define (twice m)
  (if (eq? m zero)
      zero
      (suc (suc (twice (cdr m))))))

(define (pow2 m)
  (if (eq? m zero)
      one
      (twice (pow2 (cdr m)))))

(define (consume m)
  (if (eq? m zero)
      zero
      (consume (cdr m))))

(consume (pow2 (twice (twice (twice three)))))

Lazy version

(define zero    (delay 'zero))
(define (suc x) (delay (cons 'suc x)))

(define one   (suc zero))
(define two   (suc one))
(define three (suc two))

(define (twice m)
  (delay
    (let ((mv (force m)))
      (if (eq? mv 'zero)
          (force zero)
          (force (suc (suc (twice (cdr mv)))))))))

(define (pow2 m)
  (delay
    (let ((mv (force m)))
      (if (eq? mv 'zero)
          (force one)
          (force (twice (pow2 (cdr mv))))))))

(define (consume m)
  (delay
    (let ((mv (force m)))
      (if (eq? mv 'zero)
          (force zero)
          (force (consume (cdr mv)))))))

(force (consume (pow2 (twice (twice (twice three))))))

Upvotes: 8

Views: 943

Answers (3)

mnemenaut
mnemenaut

Reputation: 820

One can see statistics for the two phases of the test program using ChezScheme's (time ...) procedure:

$ scheme
Chez Scheme Version 9.5.2
> (load-program "strict.ss")
(time (pow2 (twice (...))))
    21 collections
    0.695561822s elapsed cpu time, including 0.521065634s collecting
    0.695607000s elapsed real time, including 0.521191000s collecting
    672586992 bytes allocated, including 236483824 bytes reclaimed
(time (consume u2^24))
    no collections
    0.037766347s elapsed cpu time
    0.037762000s elapsed real time
    0 bytes allocated

and for the lazy version:

$ scheme
> (load-program "lazy.ss")
(time (pow2 (twice (...))))
    no collections
    0.000000000s elapsed cpu time
    0.000000000s elapsed real time
    400 bytes allocated
(time (force (consume u2^24)))
    572 collections
    11.997971385s elapsed cpu time, including 10.798406971s collecting
    12.012723000s elapsed real time, including 10.813517000s collecting
    4832215216 bytes allocated, including 4460306000 bytes reclaimed

So 90% of time is collecting. Adjusting collector parameters may improve this, eg:

(collect-trip-bytes 1000000)  
(collect-generation-radix (greatest-fixnum))  
(heap-reserve-ratio 2.0)

(these values halve lazy time OMM)

One might also replace ChezScheme's delay and force with stripped down versions:

(import (except (chezscheme) delay force))
        
(define (make-promise p)
  (let ([value (box p)])
    (lambda ()
      (when (box? value)
        (let ([x ((unbox value))])
          (when (box? value)
            (set! value x))))
      value)))
        
(define-syntax delay
  (syntax-rules ()
    [(_ expr) (make-promise (lambda () expr))]))
    
(define (force promise)
  (promise))

(add above to the beginning of lazy.ss)

NB these have no error checking, and don't handle multiple values or lazy boxes.
(The ChezScheme implementation is here)

With these changes the lazy version is about 4x slower than strict:

$ scheme
> (load-program "lazy.ss")
(time (pow2 (twice (...))))
    no collections
    0.000000000s elapsed cpu time
    0.000000000s elapsed real time
    336 bytes allocated
(time (force (consume u2^24)))
    3813 collections
    2.977003428s elapsed cpu time, including 2.175818398s collecting
    2.977292000s elapsed real time, including 2.179504000s collecting
    4029652320 bytes allocated, including 2414247968 bytes reclaimed

Upvotes: 1

alinsoar
alinsoar

Reputation: 15793

What you are trying to do is not encode a small lambda calculus with algebraic datatypes but you try to encode the Peano arithmetics, which is the first step up to "a small lambda".

I tried to write you some code that does it in a "faster way". Because I do not use the special forms force and delay in my code I used instead thunks to encode their logic.

(define succ
  (lambda (x)
    (lambda ()
      (cons 'succ x))))

(define zero (lambda () 'zero))
(define one (succ zero))
(define two (succ one))
(define three (succ two))

(define twice
  (lambda (n)
    (define twice
      (lambda (k)
        (if (eq? 'zero k)
          n
          (succ (twice ((cdr k)))))))
    (twice (n))))

(define pow2
  (lambda (n)
    (if (eq? 'zero n)
      one
      (twice (pow2 ((cdr n)))))))

(define print10
  (lambda (n)
    (define toten
      (lambda (n)
        (if (eq? n 'zero)
          0
          (+ 1 (toten ((cdr n)))))))
    (display (toten (n)))
    (newline))))

(print10 zero)
(print10 one)
(print10 two)
(print10 three)
(print10 (twice three))
(print10 (pow2 (zero)))
(print10 (pow2 (one)))
(print10 (pow2 (two)))
(print10 (pow2 (three)))

A test session should look so:

% mit-scheme --silent <peano.scm
0
1
2
3
6
1
2
4
8

Upvotes: 0

Francis King
Francis King

Reputation: 1732

This sounds very like a problem that crops up in Haskell from time to time. The problem is one of garbage collection.

There are two ways that this can go. Firstly, the lazy list can be consumed as it is used, so that the amount of memory consumed is limited. Or, secondly, the lazy list can be evaluated in a way that it remains in memory all of the time, with one end of the list pinned in place because it is still being used - the garbage collector objects to this and spends a lot of time trying to deal with this situation.

Haskell can be as fast as C, but requires the calculation to be strict for this to be possible.

I don't entirely understand the code, but it appears to be recursively creating a longer and longer list, which is then evaluated. Do you have the tools to measure the amount of memory that the garbage collector is having to deal with, and how much time the garbage collector runs for?

Upvotes: 0

Related Questions