Jared Smith
Jared Smith

Reputation: 21994

Getting unexpected sequence realization in Clojure

Noticed something odd while trying to troubleshoot a complex computation. I had an odd error so I started to build up the computation incrementally and pretty early on discovered that a very very long seq was being unexpectedly realized. I have a combinatoric seq of lists like so:

(0 1 2 3)
(0 1 2 4)
(0 1 2 5)

for n-choose-k for n = 143 and k = 4 somewhat unoriginally named "combos". I plan to manipulate this as a seq to keep the memory consumption reasonable but this failed:

(def combos (combinations 4 143))
(def semantically-also-combos
  (filter nil? (map identity combos)))

;; this prints instantly and uses almost no memory, as expected
(println (first combos)) ; prints (0 1 2 3)

;; this takes minutes and runs the JVM out of memory
;; without printing anything
(println (first semantically-also-combos))

According to type they are both clojure.lang.LazySeq but one works as expected and the other crashes the process. Why does the entire seq get realized just to run it through the identity function and check if it's nil?

Complete code to reproduce

(ns my-project.core
  (:gen-class))

;;; Copied from rosetta code
(defn combinations
  "If m=1, generate a nested list of numbers [0,n)
   If m>1, for each x in [0,n), and for each list in the recursion on [x+1,n), cons the two"
  [m n]
  (letfn [(comb-aux
            [m start]
            (if (= 1 m)
              (for [x (range start n)]
                (list x))
              (for [x (range start n)
                    xs (comb-aux (dec m) (inc x))]
                (cons x xs))))]
    (comb-aux m 0)))

(println "Generating combinations...")
(def combos (combinations 4 143))

(def should-also-be-combos
  (filter nil? (map identity combos)))

(defn -main
  "Calculates combos"
  [& _args]
  (println (type combos))
  (println (type should-also-be-combos)))

Upvotes: 1

Views: 98

Answers (2)

Rulle
Rulle

Reputation: 4901

Why don't you check that (filter nil? (map identity combos)) actually returns the result you would expect for much smaller arguments passed to combinations? I did that. This is what combinations returns with arguments 2 and 5:

(combinations 2 5)
;; => ((0 1) (0 2) (0 3) (0 4) (1 2) (1 3) (1 4) (2 3) (2 4) (3 4))

This is what we get with the extra lazy sequence operations of the example:

(filter nil? (map identity (combinations 2 5)))
;; => ()

What (filter nil? ...) does is to keep all elements that satisfy the predicate. And none of the elements in the input sequence is nil?, so it will scan the entire input sequence and will find none. Maybe you wanted to use (remove nil? ...), which will remove elements satisfying a predicate?

(remove nil? (map identity (combinations 2 5)))
;; => ((0 1) (0 2) (0 3) (0 4) (1 2) (1 3) (1 4) (2 3) (2 4) (3 4))

Going back to the original example, this is what I get using remove:

(first (combinations 4 143))
;; => (0 1 2 3)

(first (remove nil? (map identity (combinations 4 143))))
;; => (0 1 2 3)

My general advice is to start testing your function with "smaller" data such as 2 and 5, before going to "bigger" data such as 4 and 143.

Upvotes: 2

Alan Thompson
Alan Thompson

Reputation: 29984

I modified the example a bit as follows:

(ns tst.demo.core
  (:use tupelo.core tupelo.test))

(def cnt (atom 0))

; Copied from rosetta code
(defn combinations
  "If m=1, generate a nested list of numbers [0,n)
   If m>1, for each x in [0,n), and for each list in the recursion on [x+1,n), cons the two"
  [m n]
  (let [comb-aux (fn comb-aux
                   [m start]
                   (swap! cnt inc)
                   (when (zero? (mod @cnt 100))
                     (print \.)
                     (flush))
                   (when (zero? (mod @cnt 10000))
                     (newline)
                     (print @cnt "  "))
                   (if (= 1 m)
                     (for [x (range start n)]
                       (list x))
                     (for [x  (range start n)
                           xs (comb-aux (dec m) (inc x))]
                       (cons x xs))))]
    (comb-aux m 0)))

(println "Generating combinations...")
(def combos (combinations 4 143))

(def should-also-be-combos
  (filter nil? (map identity combos)))

and the invocation:

(dotest
  (newline)
  (spyx (type combos))
  (spyx (type should-also-be-combos))
  (newline)
  (println :1)
  (println (first combos))
  (newline)
  (println :2)
  (println (first should-also-be-combos))
  (newline))

with result:

-------------------------------
   Clojure 1.10.2    Java 15
-------------------------------

Testing tst.demo.core

(type combos)                  => clojure.lang.LazySeq
(type should-also-be-combos)   => clojure.lang.LazySeq

:1
(0 1 2 3)

:2
....................................................................................................
10000   ....................................................................................................
20000   ....................................................................................................
30000   ....................................................................................................
40000   ....................................................................................................
50000   ....................................................................................................
60000   ....................................................................................................
70000   ....................................................................................................
80000   ....................................................................................................
90000   ....................................................................................................
100000   ....................................................................................................
110000   ....................................................................................................
120000   ....................................................................................................
130000   ....................................................................................................
140000   ....................................................................................................
150000   ....................................................................................................
160000   ....................................................................................................
170000   ....................................................................................................
180000   ....................................................................................................
190000   ....................................................................................................
200000   ....................................................................................................
210000   ....................................................................................................
220000   ....................................................................................................
230000   ....................................................................................................
240000   ....................................................................................................
250000   ....................................................................................................
260000   ....................................................................................................
270000   ....................................................................................................
280000   ....................................................................................................
290000   ....................................................................................................
300000   ....................................................................................................
310000   ....................................................................................................
320000   ....................................................................................................
330000   ....................................................................................................
340000   ....................................................................................................
350000   ....................................................................................................
360000   ....................................................................................................
370000   ....................................................................................................
380000   ....................................................................................................
390000   ....................................................................................................
400000   ....................................................................................................
410000   ....................................................................................................
420000   ....................................................................................................
430000   ....................................................................................................
440000   ....................................................................................................
450000   ....................................................................................................
460000   ....................................................................................................
470000   ....................................................................................................
480000   ..........................................................................nil

with timing on my desktop computer (5 yrs old, 8 core, 32Gb RAM):

Passed all tests
Finished at 16:37:28.484 (run time: 5.354s)

So you can see the function is being invoked about 1/2 million times, which takes 5.3 seconds on my computer.

I believe you are seeing something similar to that described in Clojure Don'ts: Concat. It's recursive form also reminds me of the famous Ackerman Function, which is an example of how a seemingly innocuous function can "blow up" even for small input values.

Upvotes: 2

Related Questions