Punit Naik
Punit Naik

Reputation: 515

How to count the number of elements in an extremely large lazy seq?

I have the following (extremely large) lazy-seq:

(def lazy-list (partition-all 100000 (take 10000000000000000000 (repeat 1))))

I want to count the number of elements in it, for which I am doing the following:

(time
  (loop [ll lazy-list
         c 0]
    (if-not (seq (take 1 ll))
      c
      (recur (drop 1 ll)
             (inc c)))))

If I run this I get the following error:

Execution error (OutOfMemoryError) at user/eval2043$fn (REPL:1).
Java heap space

But if I am not holding the head anywhere, why am I seeing this OOM issue?

Upvotes: 0

Views: 157

Answers (2)

Shannon Severance
Shannon Severance

Reputation: 18410

But if I am not holding the head anywhere ... ? The head of the sequence is held by lazy-list.

Upvotes: 5

Alan Thompson
Alan Thompson

Reputation: 29958

I tried a slightly different version and it worked:

(ns tst.demo.core
  (:use demo.core tupelo.core tupelo.test))

(dotest
  (let [obj (vec (repeat 100000 2)) ]
    (loop [ll (repeat 1e8 obj)
           cnt 0]
      ; (spyx (type ll)) ; first is "clojure.lang.Repeat", rest are "clojure.lang.LazySeq"
      (if-not (seq (take 1 ll))
        (spyx cnt)
        (recur (drop 1 ll)
               (inc cnt))))
    ))

with result


-------------------------------
   Clojure 1.10.3    Java 17
-------------------------------

Testing tst.demo.core
cnt => 100000000

Ran 2 tests containing 0 assertions.
0 failures, 0 errors.

Passed all tests
Finished at 11:42:52.954 (run time: 10.706s)

Going back to your example, I tried:

(ns tst.demo.core
  (:use demo.core tupelo.core tupelo.test))

(def take-item (take 1e7 (repeat 1)))
(def lazy-list (partition-all 1e4 take-item ))

(dotest
  (spyx (type take-item))
  (spyx (type lazy-list))

  (time
    (loop [ll lazy-list
           c 0]
      (if-not (seq (take 1 ll))
        (spyx c)
        (recur (drop 1 ll)
               (inc c)))))
)

with result


-------------------------------
   Clojure 1.10.3    Java 17
-------------------------------

Testing tst.demo.core
(type take-item) => clojure.lang.LazySeq
(type lazy-list) => clojure.lang.LazySeq
c => 1000
"Elapsed time: 2787.211076 msecs"

Ran 2 tests containing 0 assertions.
0 failures, 0 errors.

Passed all tests
Finished at 11:50:42.880 (run time: 2.811s)

But note that take-item is holding on to the head of the 10 million long list of ones. I suspect that is your problem.

The moral of the story is that lazy sequences are sometimes useful, but are tricky and easy to get wrong. Then are not a miracle, can often cause unexpected bugs, and I prefer to always use concrete vectors whenever possible.

I'd recommend to only use a lazy sequence if there is no other choice possible.

Upvotes: 0

Related Questions